Here I will try to create a MPI using the census village amenities data for the year 2011, for maharashtra state.The complete data set can be found here
Before we move further ahead, it must be noted that I am using R Notebook, this format allows to communicate with great ease. It is also very useful to create various HTML documents while using R studio.
I reccomend that one uses R studio. Another handy resource is the book R for Data Science, this is the book that I often refer to and is update at a reasonable frequency. It talks about all the basics in R and points towards more specific resources.
Before I move forward, it is necessary to state that all resources required on creation of MPI by Alkire-Foster method were generously provided by Dr Christian Oldiges and Dr Juliana Milovich of Oxford Poverty & Human Developement Initiative. The resource that was extensively used was a Step by Step guide on Alkire-Foster method. One can access the book on Multidimensional Poverty Index using Alkire-Foster method here
All mistakes are mine alone
Below shown is the first R code chunk, this part is for loading necessary packages. You can install a package in R studio very easily. This can be seen in the book that was mentioned above. Download the village amenities file from the aboe given link and save it in your working directory. In the following code I have loaded tidyverse and readr packages. you can learn more about these packahes by typing ‘r ?tidyverse’ and ‘r ?readr’ in your console.
knitr:: opts_chunk$set(warning=FALSE, message=FALSE, include = TRUE)
library(tidyverse)
library(ggrepel)
library(patchwork)
DCHB <- read_csv("DCHB.csv")
View(DCHB)
The below shown code chunk is just to see the data we have imported. It is a tiny exploratory excercise. The attach command allows us to use every column in our data as an vector and is useful and gives ease in coding as well. the View command will open the data in another window in your R studio and will present the data in a table format. The names command used with head command will give names of the first 10 columns of the data set. One can guess what the tail command does when used with names function. The size of the data set can also be visible in your envirnoment. One will notice that the data set is a 43665 x 397 matrix. This is the data on 43665 villages of Maharashtra, according to census 2011, observed across 397 variables.
The Filter function under the tidyverse allows us to choose rows with specific values, note in the below code i have seperated villages that are uninhabited, i.e. with no population, and have strored it in object called Zero_pop.The other object that is made using the same function is data of villages that are populated, i.e. with population more than 0.
attach(DCHB)
View(DCHB)
nrow(DCHB)
## [1] 43665
ncol(DCHB)
## [1] 397
head(names(DCHB),10)
## [1] "State Code" "State Name" "District Code"
## [4] "District Name" "Sub District Code" "Sub District Name"
## [7] "Village Code" "Village Name" "CD Block Code"
## [10] "CD Block Name"
tail(names(DCHB),10)
## [1] "Total Unirrigated Land Area (in Hectares)"
## [2] "Area Irrigated by Source (in Hectares)"
## [3] "Canals Area (in Hectares)"
## [4] "Wells/Tube Wells Area (in Hectares)"
## [5] "Tanks/Lakes Area (in Hectares)"
## [6] "Waterfall Area (in Hectares)"
## [7] "Other Source (specify) Area (in Hectares)"
## [8] "Nearest Town Name"
## [9] "Nearest Town Distance from Village (in Km.)"
## [10] "X397"
DCHB %>% filter(`Total Population of Village`==0) -> Zero_pop
DCHB %>% filter(`Total Population of Village`>0) -> DCHB_hab
The village Amenities data observes 397 different variables for every village of maharashtra. We will choose relevant ones to create a MPI. You can once again look at all the 397 variables in the Annexure 1 and choose relevant variables. The following chunk will do the same.
DCHB_hab[,c(4,6,8,25,26,35:38, 42:45, 49:52, 56:59, 63:66, 133:138,139:144,145:150
,151:156, 208:237, 240,244, 250, 262,263, 266,272,273,276,277,
302,303,308, 309, 314:325,358:366)] -> Data_MPI
names(Data_MPI) -> Relevant_Variables
detach(DCHB)
attach(Data_MPI)
Following table shows broad classification of the relavant vaiables in themes. Accordingly we can make indicators by attaching appropriate weights. Consider the following themes and see the vaiables in each theme from Annexure 2.
| Theme | Columns (Range) |
|---|---|
| Education | 6-25 |
| Health | 26-49 |
| Access to Water | 50-78 |
| Sanitation and Waste Disposal | 79-81 |
| Connectivity | 82-92 |
| Banking and Access to credit | 93-100 |
| Electricity | 105-113 |
Using the Relevant Variable columns for Education (6-25), we will create indicator for Education theme. It is clear from looking at the varibles that school education is in focus, I believe at village level this is much more important than looking at access to college education even though it holds vital relevance. I say so because school education is under direct control of State Governments and Zila Parishad and local governments can play larger role in school education.
Indicators can be divided under Education Theme, take a look at he following table:
| Broad divisions under Education | Weights |
|---|---|
| Pre Primary | 10 |
| Primary | 30 |
| Middle School | 25 |
| Secondary | 25 |
| senior Scondary | 10 |
Further on, we will add private and government schools division wise and will say that a village with inhabitants is deprived in a particular division by using the following key
| Division | criteria 1 |
|---|---|
| Pre Primary | deprived if no school for HHs > 50 |
| Primary | deprived if no school for HHs > 100 |
| Middle School | deprived if no school for HHs > 200 |
| Secondary | deprived if no school for HHs > 500 |
| senior Scondary | deprived if no school for HHs > 1000 |
Following is the code1 for doing the above:
replace_na(`Govt Pre-Primary School (Nursery/LKG/UKG) (Status A(1)/NA(2))`,
0) -> `Govt Pre-Primary School (Nursery/LKG/UKG) (Status A(1)/NA(2))`
replace_na(`Private Pre - Primary School (Nursery/LKG/UKG) (Status A(1)/NA(2))`,
0) -> `Private Pre - Primary School (Nursery/LKG/UKG) (Status A(1)/NA(2))`
replace_na(`Govt Primary School (Status A(1)/NA(2))`,
0) -> `Govt Primary School (Status A(1)/NA(2))`
replace_na(`Private Primary School (Status A(1)/NA(2))`,
0) ->`Private Primary School (Status A(1)/NA(2))`
replace_na(`Govt Middle School (Status A(1)/NA(2))`,
0) -> `Govt Middle School (Status A(1)/NA(2))`
replace_na(`Private Middle School (Status A(1)/NA(2))`,
0) -> `Private Middle School (Status A(1)/NA(2))`
replace_na(`Govt Secondary School (Status A(1)/NA(2))`,
0) -> `Govt Secondary School (Status A(1)/NA(2))`
replace_na(`Private Secondary School (Status A(1)/NA(2))`,
0) -> `Private Secondary School (Status A(1)/NA(2))`
replace_na(`Govt Senior Secondary School (Status A(1)/NA(2))`,
0) -> `Govt Senior Secondary School (Status A(1)/NA(2))`
replace_na(`Private Senior Secondary School (Status A(1)/NA(2))`,
0) -> `Private Senior Secondary School (Status A(1)/NA(2))`
Data_MPI %>%
mutate(
tot_prepri = `Govt Pre - Primary School (Nursery/LKG/UKG) (Numbers)` +
`Private Pre - Primary School (Nursery/LKG/UKG) (Numbers)`,
flag_prepri = ifelse((tot_prepri == 0 & `Total Households`>50), 1, 0),
tot_pri = `Govt Primary School (Numbers)` +
`Private Primary School (Numbers)`,
flag_pri = ifelse((tot_pri == 0 & `Total Households`>100), 1,0),
tot_middle = `Govt Middle School (Numbers)` +
`Private Middle School (Numbers)`,
flag_middle = ifelse((tot_middle == 0 & `Total Households`>200),1,0),
tot_sec = `Govt Secondary School (Numbers)` +
`Private Secondary School (Numbers)`,
flag_sec = ifelse((tot_sec == 0 & `Total Households`>500), 1, 0),
tot_sensec = `Govt Senior Secondary School (Numbers)` +
`Private Senior Secondary School (Numbers)`,
flag_sensec = ifelse((tot_sensec==0 & `Total Households`>1000),1,0),
sch_cover = (tot_sensec + tot_sec + tot_middle +
tot_pri + tot_prepri)/`Total Households`,
EDU_ind = sch_cover + (.10*flag_sensec) + (.25*flag_sec) + (0.25*flag_middle) +
(0.30*flag_pri) + (0.10*flag_prepri)) -> Data_MPI
For creating an indicator for health, from the available data we observe the ratio of total number of vacant positions of doctors and paramedical staff at any medical service centre with respect to total number of Households, since this number can be very small, we will transform the ration by log. Recall, for Health theme, the columns from 26-49 are relevant.
Data_MPI %>%
mutate(
Heal_ind =
(
`Community Health Centre Doctors Total Strength (Numbers)` +
`Community Health Centre Para Medical Staff Total Strength (Numbers)` +
`Primary Health Centre Doctors Total Strength (Numbers)` +
`Primary Health Centre Para Medical Staff Total Strength (Numbers)` +
`Primary Heallth Sub Centre Doctors Total Strength (Numbers)` +
`Primary Heallth Sub Centre Para Medical Staff Total Strength (Numbers)` +
`Maternity And Child Welfare Centre Doctors Total Strength (Numbers)` +
`Maternity And Child Welfare Centre Para Medical Staff Total Strength (Numbers)` -
`Community Health Centre Doctors In Position (Numbers)` -
`Community Health Centre Para Medical Staff In Position (Numbers)` -
`Primary Health Centre Doctors In Position (Numbers)` -
`Primary Health Centre Para Medical Staff In Position (Numbers)` -
`Primary Heallth Sub Centre Doctors In Position (Numbers)` -
`Primary Heallth Sub Centre Para Medical Staff In Position (Numbers)` -
`Maternity And Child Welfare Centre Doctors In Position (Numbers)` -
`Maternity And Child Welfare Centre Para Medical Staff In Position (Numbers)`)) ->
Data_MPI
The above code generates on NA and to adjust the range to positive integers, I will add 10 to each observation.
a <- c(-10,-8,0,1,2,3,4,5,6,7,8,9,10,11)
Data_MPI %>% filter(Heal_ind %in% a) -> Data_MPI
Data_MPI %>%mutate(Heal_ind = Heal_ind+10) -> Data_MPI
One can argue that there is a difference of approach while choosing the indicators for Education and Health. While defining the Education indicator the approach was such that it somewhere questioned the number of schools that are present in a village and on top of it tried to see the spread of number of schools across the number of households in a village. However, while defining the indicaor for Health I have taken the presence of any helathcare service delivery infrastructure as given and assesed the deprivation as a form of absence of designated staff. This is a valid argument. I am doing so to show what different approaches can be taken.
For creating an indicator for Access to water, we will create a deprivation score depending on the source of water and seasonality, higher the score, higher the deprivation. Consider the following table:
| Source of Water | Deprivation Score |
|---|---|
| - Treated Tap water | 0 |
| - untreated Tap water | 10 |
| - Covered well | 20 |
| - Uncovered well | 30 |
| - Handpump | 40 |
| - Tube well | 50 |
| - Spring | 60 |
| - river | 70 |
| - tank/Pond/Lake | 80 |
| - Other | 90 |
The first step that we will do is create a column vector to our data frame that, regardless of the type source,shows if a village gets water around the year from at least one source. Here we also make an assumption that villages that are receiving water from aqt least one source, regardless of the type of the source, around the year are less deprived than those that receive water from different source in different seasons.
Data_MPI %>%
mutate(
all_Seasons =
any(
`Tap Water-Treated Functioning All round the year (Status A(1)/NA(2))` ==1,
`Tap Water Untreated Functioning All round the year (Status A(1)/NA(2))` == 1,
`Covered Well Functioning All round the year (Status A(1)/NA(2))` == 1,
`Uncovered Well Functioning All round the year (Status A(1)/NA(2))` == 1,
`Hand Pump Functioning All round the year (Status A(1)/NA(2))` == 1,
`Tube Wells/Borehole Functioning All round the year (Status A(1)/NA(2))` == 1,
`Spring Functioning All round the year (Status A(1)/NA(2))` == 1,
`River/Canal Functioning All round the year (Status A(1)/NA(2))` == 1,
`Tank/Pond/Lake Functioning All round the year (Status A(1)/NA(2))` == 1,
`Others Functioning All round the year (Status A(1)/NA(2))` == 1))-> Data_MPI
Since we have incorporated a vector to see if water reaches a village around the year from at least 1 source. It is also of importance to understand that sources of water matter as well, this is because treated tap water is relatively more safe as compared to water from a tank or pond. In the following code chunk, this is what we will attempt.
ifelse(`Tap Water-Treated (Status A(1)/NA(2))` == 1,0,0)->
`Tap Water-Treated (Status A(1)/NA(2))`
ifelse(`Tap Water Untreated (Status A(1)/NA(2))` == 1,10,0) ->
`Tap Water Untreated (Status A(1)/NA(2))`
ifelse(`Covered Well (Status A(1)/NA(2))` == 1,20,0) ->
`Covered Well (Status A(1)/NA(2))`
ifelse(`Uncovered Well (Status A(1)/NA(2))` == 1,30,0) ->
`Uncovered Well (Status A(1)/NA(2))`
ifelse(`Hand Pump (Status A(1)/NA(2))` == 1,40,0) ->
`Hand Pump (Status A(1)/NA(2))`
ifelse(`Tube Wells/Borehole (Status A(1)/NA(2))` == 1,50,0) ->
`Tube Wells/Borehole (Status A(1)/NA(2))`
ifelse(`Spring (Status A(1)/NA(2))` == 1,60,0) ->
`Spring (Status A(1)/NA(2))`
ifelse(`River/Canal (Status A(1)/NA(2))` == 1,70,0) ->
`River/Canal (Status A(1)/NA(2))`
ifelse(`Tank/Pond/Lake (Status A(1)/NA(2))` == 1,80,0) ->
`Tank/Pond/Lake (Status A(1)/NA(2))`
ifelse(`Others (Status A(1)/NA(2))` == 1,90,0) ->
`Others (Status A(1)/NA(2))`
Data_MPI %>%
mutate(
water_indi = log10(
`Tap Water-Treated (Status A(1)/NA(2))` +
`Tap Water Untreated (Status A(1)/NA(2))` +
`Covered Well (Status A(1)/NA(2))` +
`Uncovered Well (Status A(1)/NA(2))` +
`Hand Pump (Status A(1)/NA(2))` +
`Tube Wells/Borehole (Status A(1)/NA(2))` +
`Spring (Status A(1)/NA(2))` +
`River/Canal (Status A(1)/NA(2))` +
`Tank/Pond/Lake (Status A(1)/NA(2))` +
`Others (Status A(1)/NA(2))`) - all_Seasons) -> Data_MPI
There are three points that we look out for to create this indicator.
Is there a drainage system available in the village? If not, We socre this village in such a manner that it is depicted as deprived.
If there is a drainage system, where does the discharge gets released?
Is there a community wate disposal system?
Following is the code chunk to create an indicator for sanitaion and waste disposal. We will make two different Indicators Sanitaion (Drainage) and waste disposal.
Data_MPI %>%
mutate(
Drain_Indi =
ifelse(
`No Drainage (Status A(1)/NA(2))` == 1,100,
ifelse(
`Whether Drain water is discharged directly into water bodies or to sewar plant (For Water Bodies-1/Sewar Plants-2)` == 1,50,0)),
Waste_Indi =
ifelse(`Community waste disposal system after house to house collection (Status A(1)/NA(2))` ==
1,0,100)) ->Data_MPI
For Creating indicator for connectivity, we will consider access to Telephone, mobile, public Bus Service, Railways, Black Topped roads and All Weather roads.
Consider the following table for weights
| Type | Weights |
|---|---|
| Telephone | 20 |
| Mobile | 30 |
| Public Bus Service | 30 |
| Railways | 05 |
| Black topped roads | 10 |
| All Weather Roads | 05 |
Although we do have data that shows how far is this service accessable, if not available in a particular village, I am not differentiating because these are such basic services that need to be provided regardless of any constraints. This can be debated.
ifelse(`Telephone (landlines) (Status A(1)/NA(2))` == 2,0,1) ->
`Telephone (landlines) (Status A(1)/NA(2))`
ifelse(`Mobile Phone Coverage (Status A(1)/NA(2))` == 2,0,1) ->
`Mobile Phone Coverage (Status A(1)/NA(2))`
ifelse(`Public Bus Service (Status A(1)/NA(2))` == 2,0,1) ->
`Public Bus Service (Status A(1)/NA(2))`
ifelse(`Railway Station (Status A(1)/NA(2))` == 2,0,1) ->
`Railway Station (Status A(1)/NA(2))`
ifelse(`Black Topped (pucca) Road (Status A(1)/NA(2))` == 2,0,1) ->
`Black Topped (pucca) Road (Status A(1)/NA(2))`
ifelse(`All Weather Road (Status A(1)/NA(2))` == 2,0,1) ->
`All Weather Road (Status A(1)/NA(2))`
Data_MPI %>%
mutate(
connectivity_ind = (0.25*`Telephone (landlines) (Status A(1)/NA(2))`) +
(0.25*`Mobile Phone Coverage (Status A(1)/NA(2))`) +
(0.05*`Railway Station (Status A(1)/NA(2))`) +
(0.10* `Black Topped (pucca) Road (Status A(1)/NA(2))`)+
(0.05*`All Weather Road (Status A(1)/NA(2))`)) -> Data_MPI
A village having a commercial bank is subject to the condition if a commercial banks finds it lucrative enough to open up shop in that particular village.It is more likely that a local cooperative bank or a agricultural credit scoiety is formed and is primary banking service provider. Following are the weights associated to different types of Institutions/infrastructure.
| Type | Weights |
|---|---|
| ATM less than 5Km | 04 |
| ATM between 5-10Km | 06 |
| ATm more than 10Km | 10 |
| Commercial Bank | 20 |
| Coooperative Bank | 30 |
| Ageicultural Credit Society | 30 |
Data_MPI %>%
mutate(
Atm_In = ifelse(
`ATM (Status A(1)/NA(2))` == 1,0,
ifelse(`Nearest Facility Distance (in Km)` == "a",0.04,
ifelse(`Nearest Facility Distance (in Km)` == "b", 0.06,0.10))),
Com_bank = ifelse(`Commercial Bank (Status A(1)/NA(2))` == 1, 0, 0.2),
coop_bank = ifelse(`Cooperative Bank (Status A(1)/NA(2))` == 1,0,0.3),
Agr_credit = ifelse(`Agricultural Credit Societies (Status A(1)/NA(2))` == 1,0,0.3),
Banking_Credit_Indi = Atm_In+Com_bank+coop_bank+Agr_credit) -> Data_MPI
Dividing the three type of electricity connections and weighting accordingly.
| Type | Weights |
|---|---|
| Domestic Use | 35 |
| Agricultural Use | 35 |
| Commercial Use | 30 |
Data_MPI %>%
mutate(
dom =
ifelse(`Power Supply For Domestic Use (Status A(1)/NA(2))` == 1,
(`Power Supply For Domestic Use Summer (April-Sept.) per day (in Hours)` +
`Power Supply For Domestic Use Winter (Oct.-March) per day (in Hours)`)/2,24),
Agi =
ifelse(`Power Supply For Agriculture Use (Status A(1)/NA(2))` == 1,
(`Power Supply For Agriculture Use Summer (April-Sept.) per day (in Hours)` +
`Power Supply For Agriculture Use Winter (Oct.-March)per day (in Hours)`)/2,24),
Com =
ifelse(`Power Supply For Commercial Use (Status A(1)/NA(2))` == 1,
(`Power Supply For Commercial Use Summer (April-Sept.) per day (in Hours)` +
`Power Supply For Commercial Use Winter (Oct.-March) per day (in Hours)`)/2,24),
ele_ind =
dom+Agi+Com) -> Data_MPI
Following code chunk will create a subset of the Data_MPI object, the new object will only have the indicators and name of vilage corresponding to their disrticts and Tehsils.
Data_MPI %>% dplyr::select(contains("_ind"), 1:5) -> Data_AF
detach(Data_MPI)
attach(Data_AF)
Since all the Indicators have different scales, we will now normalize these indicators and graphically look at their Distribution.
Data_AF %>%
mutate(EDU_ind = EDU_ind/(max(EDU_ind)-min(EDU_ind)),
Heal_ind = Heal_ind/(max(Heal_ind)-min(Heal_ind)),
water_indi = water_indi/(max(water_indi)-min(water_indi)),
Drain_Indi = Drain_Indi/(max(Drain_Indi)-min(Drain_Indi)),
Waste_Indi = Waste_Indi/(max(Waste_Indi)-min(Waste_Indi)),
connectivity_ind =
connectivity_ind/(max(connectivity_ind)-min(connectivity_ind)),
Banking_Credit_Indi =
Banking_Credit_Indi/(max(Banking_Credit_Indi)-min(Banking_Credit_Indi)),
ele_ind = ele_ind/(max(ele_ind)-min(ele_ind))) ->Data_AF
Data_AF %>% gather(Indicator, Values, 1:8) %>%
ggplot(aes(Indicator, Values)) +
geom_boxplot(aes(color = Indicator)) +
theme_grey() +
guides(color = FALSE)+
coord_flip()
Now that we have seen the distribution of the indicators, let us see the correlation between the indicators.
library(corrplot)
library(ggcorrplot)
cor(Data_AF[,1:8])->ob_cor
ob_cor %>% ggcorrplot(type = "lower", lab = TRUE)
The first thing that we do is to decide cutoffs. In the following code chunks we will decide the cutoffs for all the Indicators.
let us now look at the mean and the median of the indicators when grouped by Districts.
Data_AF %>% gather(Indicator, Values, 1:8) %>%
ggplot(aes(Values, colour = Indicator))+
geom_freqpoly()+
facet_wrap(~Indicator)+
guides(colour =FALSE)
The above frequency plot shows how many villages take a particular value for each of the indicators. While deciding the cutoffs it is important to keep in mind not to choose a cutoff that is too high, this will mark every village as deprived. The flip side to this is that not to choose a cutoff that is too low, here one faces the risk of marking every village as not deprived.
The MPI as a tool allows one to present multiple deprivation of each observation in a population. The standard of what can be difined as a deprived or not deprived observation is a complex choice that has to be made by politicians, administrators and experts in cooperation with each other. Depending on this choice and definition, various targeted policies or schemes can be designed. Deciding the cutoffs is also one of these choices. This is a simplistic explanation of what the A-F method calls Union and Intersection approach.
Here, I will be choosing the cutoffs by looking at the frequency plot and try to make reasonable choice.
cutoff <- c(0.10, 0.5, 0.75, 0.75, 1, 1.25, 0.75, 1)
The deprivation matrix is obtained by creating a vector that shows wether a particular observation for an indicator is less than the cutoff. If less than the cutoff, that observation is considered to be deprived in that indicator. The following code chunk creates the deprivation matrix. We will call it g0
Data_AF %>%
mutate(EDU_dep = ifelse(EDU_ind < 0.25, 1, 0),
Heal_dep = ifelse(Heal_ind < 0.5, 1, 0),
Water_dep = ifelse(water_indi < 0.75, 1, 0),
Drain_dep = ifelse(Drain_Indi < 0.75, 1, 0),
Waste_dep = ifelse(Waste_Indi < 1, 1, 0),
Connect_dep = ifelse(connectivity_ind < 1.25, 1, 0),
Banking_dep = ifelse(Banking_Credit_Indi < 0.75, 1, 0),
Ele_dep = ifelse(ele_ind < 1, 1, 0)) %>%
dplyr::select(ends_with("dep"),`District Name`:`Total Population of Village`) -> G_0
head(G_0)
detach(Data_AF)
Using the deprivation matrix,in the following code chunk, we will form the weighted Deprivation Matrix. We will use the following weights, as described in the table below:
| Theme (No. of Indicators) | Weight |
|---|---|
| Education(1) | 0.20 |
| Health (1) | 0.20 |
| Access to Water (1) | 0.10 |
| Sanitation and Waste Disposal(2) | 0.075 (each) |
| Connectivity (1) | 0.10 |
| Banking and Access to credit(1) | 0.10 |
| Electricity(1) | 0.15 |
w = c(0.20, 0.20, 0.10, 0.075, 0.075, 0.10, 0.10, 0.15)
attach(G_0)
G_0 %>% mutate( EDU_w = ifelse(EDU_dep != 0, 0.20, 0),
Heal_w = ifelse(Heal_dep != 0, 0.20, 0),
Water_w = ifelse(Water_dep != 0, 0.10, 0),
Drain_w = ifelse(Drain_dep != 0, 0.075, 0),
Waste_w = ifelse(Waste_dep != 0, 0.075, 0),
Connect_w = ifelse(Connect_dep != 0, 0.10, 0),
Banking_w = ifelse(Banking_dep != 0, 0.10, 0),
Ele_w = ifelse(Ele_dep != 0, 0.15, 0)) %>%
dplyr::select(ends_with("w"),`District Name`:`Total Population of Village`) -> G_0_bar
head(G_0_bar)
detach(G_0)
attach(G_0_bar)
For every village we will add the deprivation score for each corresponding indicator, consequently adding a vector to the weighted derivation matrix. We will add another vector that will indicate if a particular village has 0 deprivations or not, in essence the sum of this vector will be the uncensored headcount.
G_0_bar %>%
mutate( dep_sum = EDU_w + Heal_w + Water_w + Drain_w +
Waste_w + Connect_w + Banking_w + Ele_w,
UncensoredHC = ifelse(dep_sum !=0, 1, 0)) -> G_0_bar
print("the uncensored headcount is:")
## [1] "the uncensored headcount is:"
sum(G_0_bar$UncensoredHC)
## [1] 40958
print("total number of villages: ")
## [1] "total number of villages: "
nrow(G_0_bar)
## [1] 40958
As mention before this leaves only 5 villages across Maharashtra that are not depreived in any of the themes. This is the problem with the union approach. We will set a second cutoff by using the freshly brewed variable dep_sum.
We will set another cutoff, k. say if k is set at 40%, or euqal to 0.4, this would mean a village is multidimensionaly deprived if the dep_sum of that village is greater than or equal to k.
k = 0.50
G_0_bar %>%
mutate(dep_sum_k = ifelse(dep_sum >= k, dep_sum, 0),
censoredHC_k = ifelse(dep_sum_k != 0, 1, 0),
EDU_w_k = ifelse(dep_sum_k == 0, dep_sum_k, EDU_w),
Heal_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Heal_w),
Water_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Water_w),
Drain_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Drain_w),
Waste_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Waste_w),
Connect_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Connect_w),
Banking_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Banking_w),
Ele_w_k = ifelse(dep_sum_k == 0, dep_sum_k, Ele_w)) %>%
dplyr::select(`District Name`: `Total Population of Village`, ends_with("k")) -> G_0_bar_k
detach(G_0_bar)
attach(G_0_bar_k)
Now that we have obtained the desired censored matrix, next step is to claculate the H,Headcount ratio, A (Intensity) and M0 (MPI) for Maharashtra state. We will define a function that will help calcualte the MPI, it will take two arguments, the censored matrix and the weights that were assigned to the themes.
Calc_MPI <- function(x,y) {
x %>% filter(x$dep_sum_k != 0) %>% nrow()-> count_k
HeadCountRatio_k = count_k/nrow(x)
Intensity_k = sum(x$dep_sum_k)/sum(y*count_k)
M = HeadCountRatio_k * Intensity_k
MPI <- data.frame(H_K = count_k, H_k_ratio = HeadCountRatio_k,
A_k = Intensity_k, MPI = M)
return(MPI)
}
Calc_MPI(G_0_bar_k,w) ->mah_mpi
mah_mpi
It is clear that 35390 villages of Maharashtra are Multidimensionally Deprived, i.e. nearly 86% of the villages of Maharashtra. The intensity is around 74%, at k= 40% or more. The MPI of the state is 0.64
The following code chunk will create a MPI for each district. Further, we will check the contribution of each district.
G_0_bar_k %>% group_by(`District Name`) %>%
summarise(Dis_dep_sum_k = sum(dep_sum_k),
total_villages = n(),
Dis_headcount_k = length(dep_sum_k[dep_sum_k!=0]),
Dis_headcountRatio_k = Dis_headcount_k/total_villages,
Dis_intensity_k = Dis_dep_sum_k/sum(Dis_headcount_k * w),
Dis_MPI = Dis_headcountRatio_k * Dis_intensity_k,
Dis_Contribution = (total_villages*Dis_MPI)/(40958* mah_mpi$MPI),
Dis_EDU_headcount_k = length(EDU_w_k[EDU_w_k !=0]),
Dis_Heal_headcount_k = length(Heal_w_k[Heal_w_k!=0]),
Dis_water_headcount_k = length(Water_w_k[Water_w_k!=0]),
Dis_Drain_headcount_k = length(Drain_w_k[Drain_w_k!=0]),
Dis_waste_headcount_k = length(Waste_w_k[Waste_w_k!=0]),
Dis_Connect_headcount_k = length(Connect_w_k[Connect_w_k!=0]),
Dis_Banking_headcount_k = length(Banking_w_k[Banking_w_k!=0]),
Dis_Ele_headcount_k = length(Ele_w_k[Ele_w_k!=0])) -> District_break
detach(G_0_bar_k)
attach(District_break)
We will perform a minor check by seeing if the total contruibution of all the districts comes to 1, i.e. 100%
sum(District_break$Dis_Contribution)
## [1] 1
Let us now try and visualize this population subgroup decomposition
p1<-District_break %>%
ggplot(aes(Dis_headcountRatio_k, Dis_intensity_k,
alpha =0.5,label = `District Name`))+
geom_jitter(aes(color = as.factor(round(Dis_MPI,digits = 1)),
size =round(Dis_Contribution,digits = 2)))+
geom_text_repel()+
scale_color_viridis_d()+
labs(x = "Headcount Ratio", y= "Itensity",
color = "MPI Value", size = "District's contribution to State MPI")+
guides(alpha = FALSE)
In this section we will try and understand which dimension, or what I have called theme, contributes how much to the MPI. The contribution of each dimension is the product of the ratio of the censored headcount of that dimension and MPI of entire Population and the weight of that dimension. Say for Dimension 1, h is the censored headcount ratio, M is MPI of entire population, r= h/m. And w is the weight of Dimension 1. Then contribution of Dimension 1 is equal to w*r. Here, I have assigned weights in a manner where sum of weights of all the dimensions comes to be 1. In case this sum is equal to the number of dimensions, the the formula for contribution of a particular dimension would be (w/d)r. Where d is the total number of dimensions.
District_break %>%
mutate( Dis_Contri_EDU = w[1]*(Dis_EDU_headcount_k/(Dis_MPI * total_villages)),
Dis_Contri_Heal = w[2] * (Dis_Heal_headcount_k/(Dis_MPI * total_villages)),
Dis_Contri_Water = w[3] * (Dis_water_headcount_k/(Dis_MPI * total_villages)),
Dis_Contri_Drain = w[4] * (Dis_Drain_headcount_k/(Dis_MPI * total_villages)),
Dis_Contri_Waste = w[5] * (Dis_waste_headcount_k/(Dis_MPI * total_villages)),
Dis_Contri_Connect = w[6] * (Dis_Connect_headcount_k/(Dis_MPI * total_villages)),
Dis_Contri_Banking = w[7] * (Dis_Banking_headcount_k/(Dis_MPI * total_villages)),
Dis_Contri_Ele = w[8] * (Dis_Ele_headcount_k/(Dis_MPI * total_villages))) ->District_break
p2<-District_break %>%
dplyr::select(`District Name`, Dis_Contri_EDU : Dis_Contri_Ele ) %>%
gather(Contribution, value, 2:9) %>%
ggplot(aes(`District Name`, value, fill = Contribution, alpha = 0.5))+
geom_col()+
scale_fill_brewer(palette = "Set3")+
theme(axis.text.x = element_text(angle = 90))+
labs(title = "Dimnesional Contribution For Each District", y ="Contribution")+
guides(alpha = FALSE)
p1/p2
library(maptools)
library(raster)
library(rgdal)
library(sp)
library(plotly)
#changing column name of district with NAME_1 inorder to joing the data and shape file
# Also Changing the name of Gadchiroli district to Garhchiroli, as in the shape file, even though it is incorrectly spelt.
names(District_break)->c
c[1]<-"NAME_2"
colnames(District_break)<-c
replace(District_break$NAME_2,10,"Garhchiroli")->District_break$NAME_2
India<- getData(name = "GADM", country = "IN", level = 2)
India<- spTransform(India, CRS("+init=EPSG:24381"))
Maharashtra<- India[India@data$NAME_1 == "Maharashtra",]
Maharashtra@data$id = rownames(Maharashtra@data)
Maharashtra@data <- plyr::join(Maharashtra@data, District_break, by= "NAME_2")
Maharashtra_fortified<- fortify(Maharashtra)
Maharashtra_fortified<- plyr::join(Maharashtra_fortified, Maharashtra@data, by="id")
##MPI at District level
ggplot(Maharashtra_fortified, aes(x=long,y=lat, group = group, label = Maharashtra_fortified$NAME_2))+
geom_polygon(aes(fill = Dis_MPI), colour = "red")+
ggplot2::scale_fill_viridis_c() ->p3
ggplotly(p3) %>%
highlight(
"plotly_hover",
selected = attrs_selected(line= list(color = "red"))
)
##Headcount Ratio at Dsitrict Level
ggplot(Maharashtra_fortified, aes(x=long,y=lat, group = group, label = Maharashtra_fortified$Dis_headcountRatio_k))+
geom_polygon(aes(fill = Dis_headcountRatio_k), colour = "red")+
ggplot2::scale_fill_viridis_c() ->p4
ggplotly(p4) %>%
highlight(
"plotly_hover",
selected = attrs_selected(line= list(color = "red"))
)
## INtensity at District Level
ggplot(Maharashtra_fortified, aes(x=long,y=lat, group = group, label = Maharashtra_fortified$Dis_intensity_k))+
geom_polygon(aes(fill = Dis_intensity_k), colour = "red")+
ggplot2::scale_fill_viridis_c() ->p5
ggplotly(p5) %>%
highlight(
"plotly_hover",
selected = attrs_selected(line= list(color = "red"))
)
The commands written to replace NA are something like a contingency. In case there is a NA value and I have missed it, it should not affect the addition.↩