Multidimensional Poverty Index by Using R

Here I will try to create a MPI using the census village amenities data for the year 2011, for maharashtra state.The complete data set can be found here

Before we move further ahead, it must be noted that I am using R Notebook, this format allows to communicate with great ease. It is also very useful to create various HTML documents while using R studio.

I reccomend that one uses R studio. Another handy resource is the book R for Data Science, this is the book that I often refer to and is update at a reasonable frequency. It talks about all the basics in R and points towards more specific resources.

Before I move forward, it is necessary to state that all resources required on creation of MPI by Alkire-Foster method were generously provided by Dr Christian Oldiges and Dr Juliana Milovich of Oxford Poverty & Human Developement Initiative. The resource that was extensively used was a Step by Step guide on Alkire-Foster method. One can access the book on Multidimensional Poverty Index using Alkire-Foster method here

All mistakes are mine alone

Importing Data and Necessary Packages

Below shown is the first R code chunk, this part is for loading necessary packages. You can install a package in R studio very easily. This can be seen in the book that was mentioned above. Download the village amenities file from the aboe given link and save it in your working directory. In the following code I have loaded tidyverse and readr packages. you can learn more about these packahes by typing ‘r ?tidyverse’ and ‘r ?readr’ in your console.

knitr:: opts_chunk$set(warning=FALSE, message=FALSE, include = TRUE)
library(tidyverse)
library(ggrepel)
library(patchwork)
DCHB <- read_csv("DCHB.csv")
View(DCHB)

Tiny Exploration

The below shown code chunk is just to see the data we have imported. It is a tiny exploratory excercise. The attach command allows us to use every column in our data as an vector and is useful and gives ease in coding as well. the View command will open the data in another window in your R studio and will present the data in a table format. The names command used with head command will give names of the first 10 columns of the data set. One can guess what the tail command does when used with names function. The size of the data set can also be visible in your envirnoment. One will notice that the data set is a 43665 x 397 matrix. This is the data on 43665 villages of Maharashtra, according to census 2011, observed across 397 variables.

The Filter function under the tidyverse allows us to choose rows with specific values, note in the below code i have seperated villages that are uninhabited, i.e. with no population, and have strored it in object called Zero_pop.The other object that is made using the same function is data of villages that are populated, i.e. with population more than 0.

attach(DCHB)

View(DCHB)

nrow(DCHB)
## [1] 43665
ncol(DCHB)
## [1] 397
head(names(DCHB),10)
##  [1] "State Code"        "State Name"        "District Code"    
##  [4] "District Name"     "Sub District Code" "Sub District Name"
##  [7] "Village Code"      "Village Name"      "CD Block Code"    
## [10] "CD Block Name"
tail(names(DCHB),10)
##  [1] "Total Unirrigated Land Area (in Hectares)"  
##  [2] "Area Irrigated by Source (in Hectares)"     
##  [3] "Canals Area (in Hectares)"                  
##  [4] "Wells/Tube Wells Area (in Hectares)"        
##  [5] "Tanks/Lakes Area (in Hectares)"             
##  [6] "Waterfall Area (in Hectares)"               
##  [7] "Other Source (specify) Area (in Hectares)"  
##  [8] "Nearest Town Name"                          
##  [9] "Nearest Town Distance from Village (in Km.)"
## [10] "X397"
DCHB %>% filter(`Total Population of Village`==0) -> Zero_pop

DCHB %>% filter(`Total Population of Village`>0) -> DCHB_hab

Choosing Relevant variables

The village Amenities data observes 397 different variables for every village of maharashtra. We will choose relevant ones to create a MPI. You can once again look at all the 397 variables in the Annexure 1 and choose relevant variables. The following chunk will do the same.

DCHB_hab[,c(4,6,8,25,26,35:38, 42:45, 49:52, 56:59, 63:66, 133:138,139:144,145:150
                      ,151:156, 208:237, 240,244, 250, 262,263, 266,272,273,276,277,
                      302,303,308, 309, 314:325,358:366)] -> Data_MPI

names(Data_MPI) -> Relevant_Variables

detach(DCHB)

attach(Data_MPI)

Classifying the Relevant Variables

Following table shows broad classification of the relavant vaiables in themes. Accordingly we can make indicators by attaching appropriate weights. Consider the following themes and see the vaiables in each theme from Annexure 2.

Theme Columns (Range)
Education 6-25
Health 26-49
Access to Water 50-78
Sanitation and Waste Disposal 79-81
Connectivity 82-92
Banking and Access to credit 93-100
Electricity 105-113

Creating Indicators for Education

Using the Relevant Variable columns for Education (6-25), we will create indicator for Education theme. It is clear from looking at the varibles that school education is in focus, I believe at village level this is much more important than looking at access to college education even though it holds vital relevance. I say so because school education is under direct control of State Governments and Zila Parishad and local governments can play larger role in school education.

Indicators can be divided under Education Theme, take a look at he following table:

Broad divisions under Education Weights
Pre Primary 10
Primary 30
Middle School 25
Secondary 25
senior Scondary 10

Further on, we will add private and government schools division wise and will say that a village with inhabitants is deprived in a particular division by using the following key

Division criteria 1
Pre Primary deprived if no school for HHs > 50
Primary deprived if no school for HHs > 100
Middle School deprived if no school for HHs > 200
Secondary deprived if no school for HHs > 500
senior Scondary deprived if no school for HHs > 1000

Following is the code1 for doing the above:

replace_na(`Govt Pre-Primary School (Nursery/LKG/UKG) (Status A(1)/NA(2))`,
           0) -> `Govt Pre-Primary School (Nursery/LKG/UKG) (Status A(1)/NA(2))`

replace_na(`Private Pre - Primary School (Nursery/LKG/UKG) (Status A(1)/NA(2))`,
           0) -> `Private Pre - Primary School (Nursery/LKG/UKG) (Status A(1)/NA(2))`

replace_na(`Govt Primary School (Status A(1)/NA(2))`,
           0) -> `Govt Primary School (Status A(1)/NA(2))`

replace_na(`Private Primary School (Status A(1)/NA(2))`,
           0) ->`Private Primary School (Status A(1)/NA(2))` 

replace_na(`Govt Middle  School (Status A(1)/NA(2))`,
           0) -> `Govt Middle  School (Status A(1)/NA(2))`

replace_na(`Private Middle  School (Status A(1)/NA(2))`,
           0) -> `Private Middle  School (Status A(1)/NA(2))`

replace_na(`Govt Secondary School (Status A(1)/NA(2))`,
           0) -> `Govt Secondary School (Status A(1)/NA(2))`

replace_na(`Private Secondary School (Status A(1)/NA(2))`,
           0) -> `Private Secondary School (Status A(1)/NA(2))`

replace_na(`Govt Senior Secondary School (Status A(1)/NA(2))`,
           0) -> `Govt Senior Secondary School (Status A(1)/NA(2))`

replace_na(`Private Senior Secondary School (Status A(1)/NA(2))`,
           0) -> `Private Senior Secondary School (Status A(1)/NA(2))`

Data_MPI %>% 
  mutate(
    tot_prepri = `Govt  Pre - Primary School (Nursery/LKG/UKG) (Numbers)` +
      `Private Pre - Primary School (Nursery/LKG/UKG) (Numbers)`,
    
    flag_prepri = ifelse((tot_prepri == 0 & `Total   Households`>50), 1, 0),
    
    tot_pri = `Govt Primary School (Numbers)` + 
      `Private  Primary School (Numbers)`,
    
    flag_pri = ifelse((tot_pri == 0 & `Total   Households`>100), 1,0),
    
    tot_middle = `Govt  Middle School (Numbers)` + 
      `Private Middle School (Numbers)`,
    
    flag_middle = ifelse((tot_middle == 0 & `Total   Households`>200),1,0),
    
    tot_sec = `Govt  Secondary School (Numbers)` +
      `Private Secondary  School (Numbers)`,
    
    flag_sec = ifelse((tot_sec == 0 & `Total   Households`>500), 1, 0),
    
    tot_sensec = `Govt Senior Secondary School (Numbers)` +
      `Private Senior Secondary  School (Numbers)`,
    
    flag_sensec = ifelse((tot_sensec==0 & `Total   Households`>1000),1,0),
    
    sch_cover = (tot_sensec + tot_sec + tot_middle + 
                   tot_pri + tot_prepri)/`Total   Households`,
    
    EDU_ind = sch_cover + (.10*flag_sensec) + (.25*flag_sec) + (0.25*flag_middle) +
      (0.30*flag_pri) + (0.10*flag_prepri)) -> Data_MPI

Creating Indicator for Health

For creating an indicator for health, from the available data we observe the ratio of total number of vacant positions of doctors and paramedical staff at any medical service centre with respect to total number of Households, since this number can be very small, we will transform the ration by log. Recall, for Health theme, the columns from 26-49 are relevant.

Data_MPI %>%
  mutate( 
    Heal_ind = 
      (
        `Community Health Centre Doctors Total Strength (Numbers)` + 
         `Community Health Centre Para Medical  Staff Total Strength (Numbers)` + 
         `Primary Health Centre  Doctors Total Strength (Numbers)` + 
         `Primary  Health Centre Para Medical  Staff Total Strength (Numbers)` +
         `Primary Heallth Sub Centre Doctors Total Strength (Numbers)` + 
         `Primary Heallth Sub Centre Para Medical  Staff Total Strength (Numbers)` +
         `Maternity And Child Welfare Centre Doctors Total Strength (Numbers)` + 
         `Maternity And Child Welfare Centre Para Medical  Staff Total Strength (Numbers)` -
         `Community Health Centre Doctors In Position (Numbers)` - 
         `Community Health Centre Para Medical Staff In Position (Numbers)` -
         `Primary Health  Centre Doctors In Position (Numbers)` - 
         `Primary  Health Centre Para Medical Staff In Position (Numbers)` - 
         `Primary Heallth Sub Centre Doctors In Position (Numbers)` -
         `Primary Heallth Sub Centre Para Medical Staff In Position (Numbers)` -
         `Maternity And Child Welfare Centre Doctors In Position (Numbers)` - 
         `Maternity And Child Welfare Centre Para Medical Staff In Position (Numbers)`)) -> 
  Data_MPI

The above code generates on NA and to adjust the range to positive integers, I will add 10 to each observation.

a <- c(-10,-8,0,1,2,3,4,5,6,7,8,9,10,11)

Data_MPI %>% filter(Heal_ind %in% a) -> Data_MPI

Data_MPI %>%mutate(Heal_ind = Heal_ind+10) -> Data_MPI

One can argue that there is a difference of approach while choosing the indicators for Education and Health. While defining the Education indicator the approach was such that it somewhere questioned the number of schools that are present in a village and on top of it tried to see the spread of number of schools across the number of households in a village. However, while defining the indicaor for Health I have taken the presence of any helathcare service delivery infrastructure as given and assesed the deprivation as a form of absence of designated staff. This is a valid argument. I am doing so to show what different approaches can be taken.

Creating Indicator for Access to water

For creating an indicator for Access to water, we will create a deprivation score depending on the source of water and seasonality, higher the score, higher the deprivation. Consider the following table:

Source of Water Deprivation Score
- Treated Tap water 0
- untreated Tap water 10
- Covered well 20
- Uncovered well 30
- Handpump 40
- Tube well 50
- Spring 60
- river 70
- tank/Pond/Lake 80
- Other 90

The first step that we will do is create a column vector to our data frame that, regardless of the type source,shows if a village gets water around the year from at least one source. Here we also make an assumption that villages that are receiving water from aqt least one source, regardless of the type of the source, around the year are less deprived than those that receive water from different source in different seasons.

Data_MPI %>% 
  mutate(
    all_Seasons = 
      any(
        `Tap Water-Treated Functioning All round the year (Status A(1)/NA(2))` ==1,
        `Tap Water Untreated Functioning All round the year (Status A(1)/NA(2))` == 1,
        `Covered Well Functioning All round the year (Status A(1)/NA(2))` == 1,
        `Uncovered  Well Functioning All round the year (Status A(1)/NA(2))` == 1,
        `Hand Pump Functioning All round the year (Status A(1)/NA(2))` == 1,
        `Tube Wells/Borehole Functioning All round the year (Status A(1)/NA(2))` == 1,
        `Spring Functioning All round the year (Status A(1)/NA(2))` == 1,
        `River/Canal Functioning All round the year (Status A(1)/NA(2))` == 1,
        `Tank/Pond/Lake Functioning All round the year (Status A(1)/NA(2))` == 1,
        `Others Functioning All round the year (Status A(1)/NA(2))` == 1))-> Data_MPI

Since we have incorporated a vector to see if water reaches a village around the year from at least 1 source. It is also of importance to understand that sources of water matter as well, this is because treated tap water is relatively more safe as compared to water from a tank or pond. In the following code chunk, this is what we will attempt.

ifelse(`Tap Water-Treated (Status A(1)/NA(2))` == 1,0,0)-> 
  `Tap Water-Treated (Status A(1)/NA(2))`

ifelse(`Tap Water Untreated (Status A(1)/NA(2))` == 1,10,0) -> 
  `Tap Water Untreated (Status A(1)/NA(2))`

ifelse(`Covered Well (Status A(1)/NA(2))` == 1,20,0) -> 
  `Covered Well (Status A(1)/NA(2))`

ifelse(`Uncovered  Well (Status A(1)/NA(2))` == 1,30,0) -> 
  `Uncovered  Well (Status A(1)/NA(2))`

ifelse(`Hand Pump (Status A(1)/NA(2))` == 1,40,0) -> 
  `Hand Pump (Status A(1)/NA(2))`

ifelse(`Tube Wells/Borehole (Status A(1)/NA(2))` == 1,50,0) -> 
  `Tube Wells/Borehole (Status A(1)/NA(2))`

ifelse(`Spring (Status A(1)/NA(2))` == 1,60,0) -> 
  `Spring (Status A(1)/NA(2))`

ifelse(`River/Canal (Status A(1)/NA(2))` == 1,70,0) -> 
  `River/Canal (Status A(1)/NA(2))`

ifelse(`Tank/Pond/Lake (Status A(1)/NA(2))` == 1,80,0) -> 
  `Tank/Pond/Lake (Status A(1)/NA(2))`

ifelse(`Others (Status A(1)/NA(2))` == 1,90,0) -> 
  `Others (Status A(1)/NA(2))`

Data_MPI %>% 
  mutate(
    water_indi = log10(
      `Tap Water-Treated (Status A(1)/NA(2))` +
        `Tap Water Untreated (Status A(1)/NA(2))` +
        `Covered Well (Status A(1)/NA(2))` + 
        `Uncovered  Well (Status A(1)/NA(2))` +
        `Hand Pump (Status A(1)/NA(2))` + 
        `Tube Wells/Borehole (Status A(1)/NA(2))` +
        `Spring (Status A(1)/NA(2))` + 
        `River/Canal (Status A(1)/NA(2))` +
        `Tank/Pond/Lake (Status A(1)/NA(2))` + 
        `Others (Status A(1)/NA(2))`) - all_Seasons) -> Data_MPI

Creating Indicator for Waste Disposal and Sanitaion

There are three points that we look out for to create this indicator.

  • Is there a drainage system available in the village? If not, We socre this village in such a manner that it is depicted as deprived.

  • If there is a drainage system, where does the discharge gets released?

  • Is there a community wate disposal system?

Following is the code chunk to create an indicator for sanitaion and waste disposal. We will make two different Indicators Sanitaion (Drainage) and waste disposal.

Data_MPI %>% 
  mutate( 
   
     Drain_Indi = 
      ifelse(
        `No  Drainage (Status A(1)/NA(2))` == 1,100,
        ifelse(
`Whether Drain water is discharged directly into water bodies or to sewar plant (For Water Bodies-1/Sewar Plants-2)` == 1,50,0)),
   
 Waste_Indi = 
   ifelse(`Community waste disposal system after house to house collection (Status A(1)/NA(2))` == 
               1,0,100)) ->Data_MPI

Creating Indicator for Connectivity

For Creating indicator for connectivity, we will consider access to Telephone, mobile, public Bus Service, Railways, Black Topped roads and All Weather roads.

Consider the following table for weights

Type Weights
Telephone 20
Mobile 30
Public Bus Service 30
Railways 05
Black topped roads 10
All Weather Roads 05

Although we do have data that shows how far is this service accessable, if not available in a particular village, I am not differentiating because these are such basic services that need to be provided regardless of any constraints. This can be debated.

ifelse(`Telephone (landlines) (Status A(1)/NA(2))` == 2,0,1) -> 
  `Telephone (landlines) (Status A(1)/NA(2))`

ifelse(`Mobile Phone Coverage (Status A(1)/NA(2))` == 2,0,1) -> 
  `Mobile Phone Coverage (Status A(1)/NA(2))`

ifelse(`Public Bus Service (Status A(1)/NA(2))` == 2,0,1) -> 
  `Public Bus Service (Status A(1)/NA(2))`

ifelse(`Railway Station (Status A(1)/NA(2))` == 2,0,1) -> 
  `Railway Station (Status A(1)/NA(2))`

ifelse(`Black Topped (pucca) Road (Status A(1)/NA(2))` == 2,0,1) -> 
  `Black Topped (pucca) Road (Status A(1)/NA(2))`

ifelse(`All Weather Road (Status A(1)/NA(2))` == 2,0,1) -> 
  `All Weather Road (Status A(1)/NA(2))`

Data_MPI %>% 
  mutate(
    connectivity_ind = (0.25*`Telephone (landlines) (Status A(1)/NA(2))`) +
      (0.25*`Mobile Phone Coverage (Status A(1)/NA(2))`) +
      (0.05*`Railway Station (Status A(1)/NA(2))`) +
      (0.10* `Black Topped (pucca) Road (Status A(1)/NA(2))`)+
      (0.05*`All Weather Road (Status A(1)/NA(2))`)) -> Data_MPI

Creating Indicator for Banking and Access to Credit

A village having a commercial bank is subject to the condition if a commercial banks finds it lucrative enough to open up shop in that particular village.It is more likely that a local cooperative bank or a agricultural credit scoiety is formed and is primary banking service provider. Following are the weights associated to different types of Institutions/infrastructure.

Type Weights
ATM less than 5Km 04
ATM between 5-10Km 06
ATm more than 10Km 10
Commercial Bank 20
Coooperative Bank 30
Ageicultural Credit Society 30
Data_MPI %>% 
  mutate( 
    Atm_In = ifelse(
      `ATM (Status A(1)/NA(2))` == 1,0,
      ifelse(`Nearest Facility Distance (in Km)` == "a",0.04,
             ifelse(`Nearest Facility Distance (in Km)` == "b", 0.06,0.10))),
    
    Com_bank = ifelse(`Commercial Bank (Status A(1)/NA(2))` == 1, 0, 0.2),
    
    coop_bank = ifelse(`Cooperative Bank (Status A(1)/NA(2))` == 1,0,0.3),
    
    Agr_credit = ifelse(`Agricultural Credit Societies (Status A(1)/NA(2))` == 1,0,0.3),
    
    Banking_Credit_Indi = Atm_In+Com_bank+coop_bank+Agr_credit) -> Data_MPI

Creating Indicator for Access to Electricity

Dividing the three type of electricity connections and weighting accordingly.

Type Weights
Domestic Use 35
Agricultural Use 35
Commercial Use 30
Data_MPI %>% 
  mutate( 
    dom = 
      ifelse(`Power Supply For Domestic Use  (Status A(1)/NA(2))` == 1,
             (`Power Supply For Domestic Use Summer (April-Sept.) per day (in Hours)` +
                `Power Supply For Domestic Use Winter (Oct.-March) per day (in Hours)`)/2,24),
    Agi =
      ifelse(`Power Supply For Agriculture Use (Status A(1)/NA(2))` == 1,
             (`Power Supply For Agriculture Use Summer (April-Sept.) per day (in Hours)` +
                `Power Supply For Agriculture Use Winter (Oct.-March)per day (in Hours)`)/2,24),
    Com = 
      ifelse(`Power Supply For Commercial Use (Status A(1)/NA(2))` == 1,
             (`Power Supply For Commercial Use Summer (April-Sept.) per day (in Hours)` +
                `Power Supply For Commercial Use Winter (Oct.-March) per day (in Hours)`)/2,24),
    ele_ind =
      dom+Agi+Com) -> Data_MPI 

AF Method for creation of MPI

Following code chunk will create a subset of the Data_MPI object, the new object will only have the indicators and name of vilage corresponding to their disrticts and Tehsils.

Data_MPI %>% dplyr::select(contains("_ind"), 1:5) -> Data_AF

detach(Data_MPI)

attach(Data_AF)

Since all the Indicators have different scales, we will now normalize these indicators and graphically look at their Distribution.

Data_AF %>%
  mutate(EDU_ind = EDU_ind/(max(EDU_ind)-min(EDU_ind)),
         
         Heal_ind = Heal_ind/(max(Heal_ind)-min(Heal_ind)),
         
         water_indi = water_indi/(max(water_indi)-min(water_indi)),
         
         Drain_Indi = Drain_Indi/(max(Drain_Indi)-min(Drain_Indi)),
         
         Waste_Indi = Waste_Indi/(max(Waste_Indi)-min(Waste_Indi)),
         
         connectivity_ind = 
           connectivity_ind/(max(connectivity_ind)-min(connectivity_ind)),
         
         Banking_Credit_Indi = 
           Banking_Credit_Indi/(max(Banking_Credit_Indi)-min(Banking_Credit_Indi)),
         
         ele_ind = ele_ind/(max(ele_ind)-min(ele_ind))) ->Data_AF

Data_AF %>% gather(Indicator, Values, 1:8) %>% 
  ggplot(aes(Indicator, Values)) +
  geom_boxplot(aes(color = Indicator)) +
  theme_grey() +
  guides(color = FALSE)+
  coord_flip()

Now that we have seen the distribution of the indicators, let us see the correlation between the indicators.

library(corrplot)

library(ggcorrplot)

cor(Data_AF[,1:8])->ob_cor

ob_cor %>% ggcorrplot(type = "lower", lab = TRUE)

Starting with the A-F method

The first thing that we do is to decide cutoffs. In the following code chunks we will decide the cutoffs for all the Indicators.

let us now look at the mean and the median of the indicators when grouped by Districts.

Data_AF %>% gather(Indicator, Values, 1:8) %>% 
  ggplot(aes(Values, colour = Indicator))+
  geom_freqpoly()+
  facet_wrap(~Indicator)+
  guides(colour =FALSE)

The above frequency plot shows how many villages take a particular value for each of the indicators. While deciding the cutoffs it is important to keep in mind not to choose a cutoff that is too high, this will mark every village as deprived. The flip side to this is that not to choose a cutoff that is too low, here one faces the risk of marking every village as not deprived.

The MPI as a tool allows one to present multiple deprivation of each observation in a population. The standard of what can be difined as a deprived or not deprived observation is a complex choice that has to be made by politicians, administrators and experts in cooperation with each other. Depending on this choice and definition, various targeted policies or schemes can be designed. Deciding the cutoffs is also one of these choices. This is a simplistic explanation of what the A-F method calls Union and Intersection approach.

Here, I will be choosing the cutoffs by looking at the frequency plot and try to make reasonable choice.

cutoff <- c(0.10, 0.5, 0.75, 0.75, 1, 1.25, 0.75, 1)

Creating the Deprivation Matrix

The deprivation matrix is obtained by creating a vector that shows wether a particular observation for an indicator is less than the cutoff. If less than the cutoff, that observation is considered to be deprived in that indicator. The following code chunk creates the deprivation matrix. We will call it g0

Data_AF %>%
  mutate(EDU_dep = ifelse(EDU_ind < 0.25, 1, 0),
         Heal_dep = ifelse(Heal_ind < 0.5, 1, 0),
         Water_dep = ifelse(water_indi < 0.75, 1, 0),
         Drain_dep = ifelse(Drain_Indi < 0.75, 1, 0),
         Waste_dep = ifelse(Waste_Indi < 1, 1, 0),
         Connect_dep = ifelse(connectivity_ind < 1.25, 1, 0),
         Banking_dep = ifelse(Banking_Credit_Indi < 0.75, 1, 0),
         Ele_dep = ifelse(ele_ind < 1, 1, 0)) %>% 
  dplyr::select(ends_with("dep"),`District Name`:`Total Population of Village`) -> G_0

head(G_0)
detach(Data_AF)

Weighted Deprivation Matrix

Using the deprivation matrix,in the following code chunk, we will form the weighted Deprivation Matrix. We will use the following weights, as described in the table below:

Theme (No. of Indicators) Weight
Education(1) 0.20
Health (1) 0.20
Access to Water (1) 0.10
Sanitation and Waste Disposal(2) 0.075 (each)
Connectivity (1) 0.10
Banking and Access to credit(1) 0.10
Electricity(1) 0.15
w = c(0.20, 0.20, 0.10, 0.075, 0.075, 0.10, 0.10, 0.15)

attach(G_0)

G_0 %>% mutate( EDU_w = ifelse(EDU_dep != 0, 0.20, 0),
                Heal_w = ifelse(Heal_dep != 0, 0.20, 0),
                Water_w = ifelse(Water_dep != 0, 0.10, 0),
                Drain_w = ifelse(Drain_dep != 0, 0.075, 0),
                Waste_w = ifelse(Waste_dep != 0, 0.075, 0),
                Connect_w = ifelse(Connect_dep != 0, 0.10, 0),
                Banking_w = ifelse(Banking_dep != 0, 0.10, 0),
                Ele_w = ifelse(Ele_dep != 0, 0.15, 0)) %>% 
  dplyr::select(ends_with("w"),`District Name`:`Total Population of Village`) -> G_0_bar

head(G_0_bar)
detach(G_0)

attach(G_0_bar)
Uncensored HeadCount

For every village we will add the deprivation score for each corresponding indicator, consequently adding a vector to the weighted derivation matrix. We will add another vector that will indicate if a particular village has 0 deprivations or not, in essence the sum of this vector will be the uncensored headcount.

G_0_bar %>% 
  mutate( dep_sum = EDU_w + Heal_w + Water_w + Drain_w + 
            Waste_w + Connect_w + Banking_w + Ele_w,
          UncensoredHC = ifelse(dep_sum !=0, 1, 0)) -> G_0_bar

print("the uncensored headcount is:") 
## [1] "the uncensored headcount is:"
sum(G_0_bar$UncensoredHC)
## [1] 40958
print("total number of villages: ") 
## [1] "total number of villages: "
nrow(G_0_bar)
## [1] 40958

As mention before this leaves only 5 villages across Maharashtra that are not depreived in any of the themes. This is the problem with the union approach. We will set a second cutoff by using the freshly brewed variable dep_sum.

We will set another cutoff, k. say if k is set at 40%, or euqal to 0.4, this would mean a village is multidimensionaly deprived if the dep_sum of that village is greater than or equal to k.

k = 0.50


G_0_bar %>% 
  mutate(dep_sum_k = ifelse(dep_sum >= k, dep_sum, 0),
                   censoredHC_k = ifelse(dep_sum_k != 0, 1, 0),
                   EDU_w_k = ifelse(dep_sum_k == 0, dep_sum_k, EDU_w),
                   Heal_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Heal_w),
                   Water_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Water_w),
                   Drain_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Drain_w),
                   Waste_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Waste_w),
                   Connect_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Connect_w),
                   Banking_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Banking_w),
                   Ele_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Ele_w)) %>% 
  dplyr::select(`District Name`: `Total Population of Village`, ends_with("k")) -> G_0_bar_k

detach(G_0_bar)

attach(G_0_bar_k)

Now that we have obtained the desired censored matrix, next step is to claculate the H,Headcount ratio, A (Intensity) and M0 (MPI) for Maharashtra state. We will define a function that will help calcualte the MPI, it will take two arguments, the censored matrix and the weights that were assigned to the themes.

Calc_MPI <- function(x,y) {
  
x %>% filter(x$dep_sum_k != 0) %>% nrow()-> count_k

HeadCountRatio_k = count_k/nrow(x)  

Intensity_k = sum(x$dep_sum_k)/sum(y*count_k)

M = HeadCountRatio_k * Intensity_k

MPI <- data.frame(H_K = count_k, H_k_ratio = HeadCountRatio_k,
                  A_k = Intensity_k, MPI = M)

return(MPI)
}

Calc_MPI(G_0_bar_k,w) ->mah_mpi

mah_mpi

It is clear that 35390 villages of Maharashtra are Multidimensionally Deprived, i.e. nearly 86% of the villages of Maharashtra. The intensity is around 74%, at k= 40% or more. The MPI of the state is 0.64

Decomposition and Breakdowns

Population Sub Group: District Level

The following code chunk will create a MPI for each district. Further, we will check the contribution of each district.

G_0_bar_k %>% group_by(`District Name`) %>% 
  summarise(Dis_dep_sum_k = sum(dep_sum_k), 
            total_villages = n(),
            Dis_headcount_k = length(dep_sum_k[dep_sum_k!=0]),
            Dis_headcountRatio_k = Dis_headcount_k/total_villages,
            Dis_intensity_k = Dis_dep_sum_k/sum(Dis_headcount_k * w),
            Dis_MPI = Dis_headcountRatio_k * Dis_intensity_k,
            Dis_Contribution  = (total_villages*Dis_MPI)/(40958* mah_mpi$MPI),
            Dis_EDU_headcount_k = length(EDU_w_k[EDU_w_k !=0]),
            Dis_Heal_headcount_k = length(Heal_w_k[Heal_w_k!=0]),
            Dis_water_headcount_k = length(Water_w_k[Water_w_k!=0]),
            Dis_Drain_headcount_k = length(Drain_w_k[Drain_w_k!=0]),
            Dis_waste_headcount_k = length(Waste_w_k[Waste_w_k!=0]),
            Dis_Connect_headcount_k = length(Connect_w_k[Connect_w_k!=0]),
            Dis_Banking_headcount_k = length(Banking_w_k[Banking_w_k!=0]),
            Dis_Ele_headcount_k = length(Ele_w_k[Ele_w_k!=0])) -> District_break

detach(G_0_bar_k)

attach(District_break)

We will perform a minor check by seeing if the total contruibution of all the districts comes to 1, i.e. 100%

sum(District_break$Dis_Contribution)
## [1] 1

Let us now try and visualize this population subgroup decomposition

p1<-District_break %>%
  ggplot(aes(Dis_headcountRatio_k, Dis_intensity_k, 
             alpha =0.5,label = `District Name`))+
  geom_jitter(aes(color = as.factor(round(Dis_MPI,digits = 1)), 
                  size =round(Dis_Contribution,digits = 2)))+
  geom_text_repel()+
  scale_color_viridis_d()+
  labs(x = "Headcount Ratio", y= "Itensity",
       color = "MPI Value", size = "District's contribution to State MPI")+
  guides(alpha = FALSE)

Dimensional Breakdown

In this section we will try and understand which dimension, or what I have called theme, contributes how much to the MPI. The contribution of each dimension is the product of the ratio of the censored headcount of that dimension and MPI of entire Population and the weight of that dimension. Say for Dimension 1, h is the censored headcount ratio, M is MPI of entire population, r= h/m. And w is the weight of Dimension 1. Then contribution of Dimension 1 is equal to w*r. Here, I have assigned weights in a manner where sum of weights of all the dimensions comes to be 1. In case this sum is equal to the number of dimensions, the the formula for contribution of a particular dimension would be (w/d)r. Where d is the total number of dimensions.

District_break %>% 
  mutate( Dis_Contri_EDU = w[1]*(Dis_EDU_headcount_k/(Dis_MPI * total_villages)),
          Dis_Contri_Heal = w[2] * (Dis_Heal_headcount_k/(Dis_MPI * total_villages)),
          Dis_Contri_Water = w[3] * (Dis_water_headcount_k/(Dis_MPI * total_villages)),
          Dis_Contri_Drain = w[4] * (Dis_Drain_headcount_k/(Dis_MPI * total_villages)),
          Dis_Contri_Waste = w[5] * (Dis_waste_headcount_k/(Dis_MPI * total_villages)),
          Dis_Contri_Connect = w[6] * (Dis_Connect_headcount_k/(Dis_MPI * total_villages)),
          Dis_Contri_Banking = w[7] * (Dis_Banking_headcount_k/(Dis_MPI * total_villages)),
          Dis_Contri_Ele = w[8] * (Dis_Ele_headcount_k/(Dis_MPI * total_villages))) ->District_break


p2<-District_break %>%  
  dplyr::select(`District Name`, Dis_Contri_EDU : Dis_Contri_Ele ) %>% 
  gather(Contribution, value, 2:9) %>% 
  ggplot(aes(`District Name`, value, fill = Contribution, alpha = 0.5))+
  geom_col()+
  scale_fill_brewer(palette = "Set3")+
  theme(axis.text.x = element_text(angle = 90))+
  labs(title = "Dimnesional Contribution For Each District", y ="Contribution")+
  guides(alpha = FALSE)

p1/p2

library(maptools)
library(raster)
library(rgdal)
library(sp)
library(plotly)



#changing column name of district with NAME_1 inorder to joing the data and shape file
# Also Changing the name of Gadchiroli district to Garhchiroli, as in the shape file, even though it is incorrectly spelt.

names(District_break)->c
c[1]<-"NAME_2"
colnames(District_break)<-c
replace(District_break$NAME_2,10,"Garhchiroli")->District_break$NAME_2



India<- getData(name = "GADM", country = "IN", level = 2)

India<- spTransform(India, CRS("+init=EPSG:24381"))

Maharashtra<- India[India@data$NAME_1 == "Maharashtra",]

Maharashtra@data$id = rownames(Maharashtra@data)
Maharashtra@data <- plyr::join(Maharashtra@data, District_break, by= "NAME_2")

Maharashtra_fortified<- fortify(Maharashtra)

Maharashtra_fortified<- plyr::join(Maharashtra_fortified, Maharashtra@data,  by="id")

##MPI at District level
ggplot(Maharashtra_fortified, aes(x=long,y=lat, group = group, label = Maharashtra_fortified$NAME_2))+
  geom_polygon(aes(fill = Dis_MPI), colour = "red")+
  ggplot2::scale_fill_viridis_c() ->p3



ggplotly(p3) %>% 
  highlight(
    "plotly_hover",
    selected = attrs_selected(line= list(color = "red"))

  )
##Headcount Ratio at Dsitrict Level
ggplot(Maharashtra_fortified, aes(x=long,y=lat, group = group, label = Maharashtra_fortified$Dis_headcountRatio_k))+
  geom_polygon(aes(fill = Dis_headcountRatio_k), colour = "red")+
  ggplot2::scale_fill_viridis_c() ->p4



ggplotly(p4) %>% 
  highlight(
    "plotly_hover",
    selected = attrs_selected(line= list(color = "red"))
    

  )
## INtensity at District Level

ggplot(Maharashtra_fortified, aes(x=long,y=lat, group = group, label = Maharashtra_fortified$Dis_intensity_k))+
  geom_polygon(aes(fill = Dis_intensity_k), colour = "red")+
  ggplot2::scale_fill_viridis_c() ->p5



ggplotly(p5) %>% 
  highlight(
    "plotly_hover",
    selected = attrs_selected(line= list(color = "red"))
  

  )

  1. The commands written to replace NA are something like a contingency. In case there is a NA value and I have missed it, it should not affect the addition.