1.Theoretical Section

1.1 Introduction

In developing countries, nutritional deficiency is the main cause of under- five mortality. Around, 6.3 million children of under five years of age die every year. In Pakistan, situation is even more alarming roughly about 38 percent of children under five years of age are stunted. Incomplete or no immunization puts the children at risk of infection from diseases like TB, measles, polio and other vaccine preventable diseases. In 2008, it was estimated that 17 percent of all the deaths among 0-59 months of children were vaccine preventable. Maintaining an improved child health status becomes a key challenge for majority of developing and lower middle income countries as it plays an important role in the social and capital growth of the country (UNICEF, 2019).

There exists an abundant literature which focuses on evaluating the child health outcomes in terms of socioeconomic characteristics. However, these studies examined the correlation and spatial heterogeneity among child malnutrition and immunization coverage, the association between child health outcomes and health care utilization and other characteristics has not been explored yet. Recent research on public health suggests that geospatial mapping and modeling of various and diverse demographic events help in identification of the pockets and therefore, help in efficient resource allocation. Therefore the present study, is a an attempt to this field of research and to explore the existence of spatial heterogeneity in the selected child health outcome i.e stunting on the basis of immunization, women literacy, child health insurance and other socioeconomic characteristics in the districts of province Punjab, Pakistan.

1.2 Methodology

1.2.1 Data Sources

Data related to all variables have been retrieved from Multiple Indicator Cluster Survey (MICS, 2017-18).Its a household survey that aimed at providing district level estimates on population, health, nutrition and other demographic and socioeconomic indicators(https://www.unicef.org/pakistan/reports/multiple-indicator-cluster- survey-2017-18-punjab).Selection of variables was done by keeping in view the previous studies and there importance with the outcome variable i.e stunting. The shape files to create maps were obtained from DIVA-GIS (//www.diva-gis.org/gdata). All districts of province Punjab are a part of this study.Following are the variables which have been used in this study.All variables are in percentage.

  • Stunting rate in Children (stnd)

  • Women Exposure to Mass Media (mass_media)

  • Pre mature birth (pre_mtr_brth)

  • Birth before age of 18

  • Children who have Full Immunization (full_immunization)

  • Children who received mothers milk continuously for two years (cnt_mil_2)

  • Women who got married before the age of 15 (mar_15)

  • Women Literacy rate (wmn_litr)

  • Children who are receiving Health Insurance (hlth_ins_chd)

1.2.2 Analytical Approach to this Study

This study will employ a rigorous and analytic approach in estimation. We will simply start by estimating OLS regression. Then to see in our case if we need and require the use of spatial methods and models we will perform a test on OLS residuals to see if the residuals are spatially auto-correlated or spatially random? If it turns out that they are spatially auto-correlated then the various spatial models such as SDM, SLM, SEM and SAR etc will be estimated.The results of these models will be compared based on the AIC criteria and the significance of explanatory variables Furthermore, to determine the spatial autocorrelation and clustering for our variable of interest i.e stunting Moron’s I would be calculated. It takes the value between +1 to -1.Positive spatial autocorrelation demonstrates that points with similar attribute values are closely distributed in space and points with negative spatial autocorrelation indicates that closely associated points are more dissimilar. We will run this test to see if we have spatial autocorrelation/dependency among locations. To further extend this analysis and to analyze relationships Joint Count Test will also be performed to test spatial relationships.

2.Applied Section

Before turning ourselves to applied part lets first load the required libraries that are compulsory to run the different estimations.

#Loading the required libraries

library(ggplot2)
library(sf)
## Warning: package 'sf' was built under R version 4.0.4
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(maps)
## Warning: package 'maps' was built under R version 4.0.5
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble  3.0.4     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   2.0.0     v forcats 0.5.0
## v purrr   0.3.4
## Warning: package 'readr' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x purrr::map()    masks maps::map()
library("readxl")
library(rgdal)
## Warning: package 'rgdal' was built under R version 4.0.4
## Loading required package: sp
## rgdal: version: 1.5-23, (SVN revision 1121)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.1, released 2020/12/29
## Path to GDAL shared files: C:/Users/muhammad usman/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/muhammad usman/Documents/R/win-library/4.0/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-5
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## Overwritten PROJ_LIB was C:/Users/muhammad usman/Documents/R/win-library/4.0/rgdal/proj
library(maptools)
## Checking rgeos availability: TRUE
library(sp)
library(RColorBrewer)
library(GISTools)
## Warning: package 'GISTools' was built under R version 4.0.5
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
## Loading required package: rgeos
## Warning: package 'rgeos' was built under R version 4.0.5
## rgeos version: 0.5-5, (SVN revision 640)
##  GEOS runtime version: 3.8.0-CAPI-1.13.1 
##  Linking to sp version: 1.4-5 
##  Polygon checking: TRUE
## 
## Attaching package: 'GISTools'
## The following object is masked from 'package:maps':
## 
##     map.scale
library(maps)
library("splm")
## Warning: package 'splm' was built under R version 4.0.5
## Loading required package: spdep
## Warning: package 'spdep' was built under R version 4.0.4
## Loading required package: spData
## Warning: package 'spData' was built under R version 4.0.4
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
library("car")
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
library("lmtest")
## Warning: package 'lmtest' was built under R version 4.0.5
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library("Formula")
library("plm")
## Warning: package 'plm' was built under R version 4.0.5
## 
## Attaching package: 'plm'
## The following objects are masked from 'package:dplyr':
## 
##     between, lag, lead

Lets first read the data from our directory.

my_data <- read_excel("pk_data.xlsx")
head(my_data, 5)# Showing the first 5 observations 
## # A tibble: 5 x 11
##   District  ID_3 mass_media pre_mtr_brth brth_bf_18 full_imunization cnt_mil_2
##   <chr>    <dbl>      <dbl>        <dbl>      <dbl>            <dbl>     <dbl>
## 1 Bahawal~    77        0.5          3.2       15.3             60        28.3
## 2 Bahawal~    78        1            1.9       10.6             61.2      39.7
## 3 Rahimya~    79        0.3          5.2       17.5             62.3      48.5
## 4 Dera Gh~    80        0.2          6.7       17.2             44.1      36.1
## 5 Layyah      81        0.3          6         10.7             72.9      36.3
## # ... with 4 more variables: stnd <dbl>, mar_15 <dbl>, hlth_ins_chd <dbl>,
## #   wmn_litr <dbl>
#Lets also read the district level shape file for Pakistan 
dis.sf<-st_read("PAK_adm3.shp")
## Reading layer `PAK_adm3' from data source `D:\Spatial Econometrics\Term paper\PAK_adm3.shp' using driver `ESRI Shapefile'
## Simple feature collection with 141 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 60.89944 ymin: 23.70292 xmax: 77.84308 ymax: 37.09701
## Geodetic CRS:  WGS 84
head(dis.sf, 3)
## Simple feature collection with 3 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 73.49163 ymin: 32.77836 xmax: 74.47195 ymax: 34.12512
## Geodetic CRS:  WGS 84
##   ID_0 ISO   NAME_0 ID_1       NAME_1 ID_2       NAME_2 ID_3  NAME_3   TYPE_3
## 1  171 PAK Pakistan    1 Azad Kashmir    1 Azad Kashmir    1    Bagh District
## 2  171 PAK Pakistan    1 Azad Kashmir    1 Azad Kashmir    2 Bhimber District
## 3  171 PAK Pakistan    1 Azad Kashmir    1 Azad Kashmir    3   Kotli District
##   ENGTYPE_3 NL_NAME_3 VARNAME_3                       geometry
## 1  District      <NA>      <NA> MULTIPOLYGON (((74.04134 33...
## 2  District      <NA>      <NA> MULTIPOLYGON (((74.39078 32...
## 3  District      <NA>      <NA> MULTIPOLYGON (((74.01398 33...

This above is multipolygon object having 3 features and 13 fields. Feature named NAME_3 indicates the districts of Pakistan. As this study focuses only on the districts of province Punjab so its better to filter out only the districts of province Punjab in a separate file and saved it as punjb.

#filtering out the districts of province punjab for our purpose
punjb <- dis.sf %>% filter(NAME_1 == "Punjab")

Now lets visualize the map of districts of province Punjab. Following lines of code will serve our purpose.

ggplot() +
  geom_sf(punjb, mapping=aes(geometry=geometry, fill=NAME_3))+
  labs(title= "Map of the Districts of Punjab Province")

In order to inspect and visualize the child stunting rate across districts we need to merge our data files i.e shape file and data file by the same column ID_3 (contains the map_ID as declared by DIV-GIS).

#Taking the desired data to see spatial distribution of stunting rate across districts

data <- my_data[, c("ID_3", "stnd")]
punjb <- merge(punjb, data, by= "ID_3")

Following map shows the spatial distribution and variation in stunting rate across districts of province Punjab.

ggplot() +
  geom_sf(punjb, mapping=aes(geometry=geometry, fill=stnd))+
   scale_fill_gradient(low = "#fff7ec",  high = "#7F0000") +
  labs(title= "Spatial Distribution of Stunting rate in Districts of Province Punjab 2017-18")

The above map shows the spatial distribution and variation of stunting rate in districts of Punjab. We can see that southern districts of Punjab such as Rahim Yar Khan, Lodhran, Dear Gazi Khan and Rajanpur tend to have higher percentage of stunting. Contrary to this, districts such as Rawalpindi, Attock, Jehlum and Chakwal etc possess lower rates of stunting.

2.1 Data Preprocessing and Transformations

Its always recommended and highly regarded to see the distributions of our features before applying any sort of modeling. As we can see in the following figures, the distributions for few of the variables such as mass_media, pre_mar_brth, mar_15, and hlth_ins_chd don’t seem to look normal. Its better to apply log and then see if it makes any difference.

par(mfrow=c(3,4))
par(mar=c(2,2,1,1))
hist(my_data$mass_media)
hist(my_data$pre_mtr_brth)
hist(my_data$brth_bf_18)
hist(my_data$full_imunization)
hist(my_data$cnt_mil_2)
hist(my_data$stnd)
hist(my_data$mar_15)
hist(my_data$hlth_ins_chd)
hist(my_data$wmn_litr)

Visualizing log transformed variables and one can easily assess log transformation does help our cause.

As after log transformation our data looks in a better shape. Additionally, we dont require first two columns for our modeling purpose so its better to remove them.

2.2.1 Contiguity Matrix

To run and estimate spatial models we require spatial weights matrix which is the key feature of spatial modeling which demonstrate the structure of neighborhood. There are different types of neighborhood matrices :

  • Contiguity matrix (matrix of common border)

  • Matrix of k nearest neighbors

  • Matrix of neighbors within d km (neighbors in radius of d km)

The choice of the spatial weights matrix modeling is an important step. When a-priori selected matrix does not correspond to a real relationship, there is a need to find more adequate spatial weights matrix. Lets estimate the contiguity matrix. We require the spatial object and will use poly2nb function.

#lets convert into spaital object to get the contiguity matrix
support.sp <- as(punjb, "Spatial")
#lets calculate contiguity matrix

cont.nb<-poly2nb(as(support.sp, "SpatialPolygons"))

cont.listw<-nb2listw(cont.nb,  style="W")

cont.listw
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 38 
## Number of nonzero links: 184 
## Percentage nonzero weights: 12.74238 
## Average number of links: 4.842105 
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 38 1444 38 16.72392 156.2708

In the above output, Average number of links equals 4.84 which means on average every point or in our case every district has 4.84 neighbors.

#lets look at the row standardize matrix
punj.mt <- nb2mat(cont.nb)
punj.mt[1:5, 1:5] 
##        [,1]      [,2]      [,3] [,4]      [,5]
## 1 0.0000000 0.2500000 0.0000000 0.00 0.0000000
## 2 0.1666667 0.0000000 0.1666667 0.00 0.0000000
## 3 0.0000000 0.3333333 0.0000000 0.00 0.0000000
## 4 0.0000000 0.0000000 0.0000000 0.00 0.3333333
## 5 0.0000000 0.0000000 0.0000000 0.25 0.0000000

This above output shows the row standardized contiguity matrix. It has been assigned weights depending upon the neighborhood structure. #### 2.2 Linear Regression Model

#lm model and to see if we can use spatial methods

model.lm1 <-lm(stnd~mass_media+pre_mtr_brth+full_imunization+
                 cnt_mil_2+mar_15+hlth_ins_chd+brth_bf_18+
                 wmn_litr, data=my_data)
summary(model.lm1)
## 
## Call:
## lm(formula = stnd ~ mass_media + pre_mtr_brth + full_imunization + 
##     cnt_mil_2 + mar_15 + hlth_ins_chd + brth_bf_18 + wmn_litr, 
##     data = my_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2970 -2.5864  0.3983  1.9004  6.3268 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      40.71592    8.70063   4.680 6.17e-05 ***
## mass_media        0.35871    1.63721   0.219   0.8281    
## pre_mtr_brth     -0.07312    0.05559  -1.315   0.1987    
## full_imunization -0.08547    0.06480  -1.319   0.1975    
## cnt_mil_2         0.06524    0.09399   0.694   0.4931    
## mar_15            0.69894    0.52847   1.323   0.1963    
## hlth_ins_chd     -0.62108    0.27948  -2.222   0.0342 *  
## brth_bf_18        0.32813    0.21226   1.546   0.1330    
## wmn_litr         -0.20675    0.08469  -2.441   0.0210 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.447 on 29 degrees of freedom
## Multiple R-squared:  0.8364, Adjusted R-squared:  0.7913 
## F-statistic: 18.54 on 8 and 29 DF,  p-value: 1.801e-09

Our OLS estimates suggest that hlth_ins_chd and wmn_litr possess a significant relationship with the stunting rate. However,our main purpose here is not to interpret these coefficients but to see if we can apply spatial methods. Lets now run a test for regression residuals to see if they are spatially random or spatially auto-correlated.

H0: Residuals are spatially random

H1: Residuals are spatially correlated

res1 <- model.lm1$residuals
lm.morantest(model.lm1, cont.listw) # are residuals spatially random?
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = stnd ~ mass_media + pre_mtr_brth + full_imunization
## + cnt_mil_2 + mar_15 + hlth_ins_chd + brth_bf_18 + wmn_litr, data =
## my_data)
## weights: cont.listw
## 
## Moran I statistic standard deviate = 3.1576, p-value = 0.0007953
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.177716097     -0.090078278      0.007192518
moran.test(res1, cont.listw) #justification for spatial models
## 
##  Moran I test under randomisation
## 
## data:  res1  
## weights: cont.listw    
## 
## Moran I statistic standard deviate = 2.0229, p-value = 0.02154
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.17771610       -0.02702703        0.01024351

As p value is less then assumed level of significance 0.05, we reject null hypothesis in favor of the alternative hypothesis stating that residuals are spatially random. In fact they are not spatially random but they are related in space means spatially correlated.

We can further run the Moron’s I test to see if we observe a spatial dependency in stunting rate across districts or in space.

#Moron test for spatial autocorrelation among neighbors
result01<-moran.test(my_data$stnd, cont.listw) 
result01
## 
##  Moran I test under randomisation
## 
## data:  my_data$stnd  
## weights: cont.listw    
## 
## Moran I statistic standard deviate = 7.2508, p-value = 2.071e-13
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.70465479       -0.02702703        0.01018282

As the p- value is quite low suggesting that we will reject null hypothesis in favor of alternative hypothesis. It means that we observe spatial association or spatial autocorrelation in stunting rate in space.

2.3 Moron’s Scatter Plot

Lets draw the Moron’s scatter plot to see whether high regions are next to high and low regions are next to low.

x<-my_data$stnd # variable selection
zx<-as.data.frame(scale(x))  #standardization of variable

# Moran scatterplot – automatic version
moran.plot(zx$V1, cont.listw, pch=19, labels=as.character(my_data$District))

We can see in the above scatter plot high regions are next to each other and low regions are next to each other. We can even draw the plot and clearly observe clusters. Districts possessing higher stunting rates are grouped together and are next to each other in comparison to districts possessing lower stunting rates.

2.3.1 Moron’s Plot

We can even draw a map and see how districts are grouped together. Following lines of code will create the desired map.

# map
brks<-c(1,2,4)
cols<-c("grey25", "grey60", "grey85", "grey45")
plot(support.sp, col=cols[findInterval(cond$V1, brks)])
legend("bottomright", legend=c("I Q - HH – high surrounded by high", "II Q - LH - low surrounded by high", "III Q - LL - low surrounded by low", "IV Q - HL - high surrounded by low"), fill=cols, bty="n", cex=0.80)
title(main="Mapping of Moranscatterplot results
Stunting rate in 2017-18")

The above moron’s plot clearly demonstrates clusters of high being surrounded by high regions and low region surrounded by low regions.

2.4 Joint Count Test

We can further run Joint Count Test to test spatial relationships. The idea of the test is to divide the variable value into groups, assign a colour to them and apply colours to the map. The number of contacts (connections) of the same colours relative to the total number of contacts is tested.

var.factor<-factor(cut(my_data$stnd, breaks=c(15,30,45,60), labels=c("low", "medium", "high")))
head(var.factor, 10)
##  [1] medium medium high   high   low    medium high   low    medium low   
## Levels: low medium high
# parameters of graphics
brks1<-c(15,30,45,60) 
cols<-c("green", "blue", "red")

# scatterplot of values
plot(my_data$stnd, bg=cols[findInterval(my_data$stnd, brks1)], pch=21)
abline(h=c(10,20,40), lty=3)

# spatial distribution with three colours
plot(support.sp, col=cols[findInterval(my_data$stnd, brks1)])
#plot(voi, add=TRUE, lwd=2)
title(main="Stunting Rate in 2017-18")
legend("bottomleft", legend=c("low", "medium", "high"), leglabs(brks1), fill=cols, bty="n")

joincount.test(var.factor, cont.listw)
## 
##  Join count test under nonfree sampling
## 
## data:  var.factor 
## weights: cont.listw 
## 
## Std. deviate for low = 4.474, p-value = 3.839e-06
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##             7.6458333             5.1351351             0.3149243 
## 
## 
##  Join count test under nonfree sampling
## 
## data:  var.factor 
## weights: cont.listw 
## 
## Std. deviate for medium = 5.4415, p-value = 2.642e-08
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##              5.600794              2.837838              0.257821 
## 
## 
##  Join count test under nonfree sampling
## 
## data:  var.factor 
## weights: cont.listw 
## 
## Std. deviate for high = 4.7961, p-value = 8.091e-07
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##            0.66666667            0.08108108            0.01490772

2.3 Spatial Models

Lets turn to the most important part of our discussion i.e spatial models. To run and estimate spatial models we require formula form that includes variables both dependent and independent. First lets create the object form.

form <- formula(stnd~mass_media+ pre_mtr_brth+brth_bf_18+full_imunization+
                  cnt_mil_2+mar_15+hlth_ins_chd+wmn_litr)

2.3.1 General Nesting Spatial Model (GNS)

We will start with Manski model. Its a full specified model it includes spatial lag of Y (rho), spatial lag of X (theta) and spatial error term (lambda).

GNS: Y=β_0+ρWY+Xβ+WXθ+u and u=λWu+e

where rho (ρ) is the spatial lag of Y, theta (θ) represents spatial lag of X, lambda (λ spatial error term and Xβ set of other explanatory variables. W represents the contiguity matrix. It may be symmetric, but not necessarily , wij - weights for each pair of locations; result from a rule on neighborhood structure and define the existence and strength of the spatial relations among locations.

GNS_1<-sacsarlm(form, data=my_data, listw=cont.listw, type="sacmixed", method="LU") 
# method="LU" speeds up computations and option type="sacmixed" activates spatial lags of X

summary(GNS_1)
## 
## Call:sacsarlm(formula = form, data = my_data, listw = cont.listw, 
##     type = "sacmixed", method = "LU")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4.09534 -0.80346  0.12921  1.18365  3.25735 
## 
## Type: sacmixed 
## Coefficients: (numerical Hessian approximate standard errors) 
##                       Estimate Std. Error z value Pr(>|z|)
## (Intercept)           0.552720  14.107561  0.0392 0.968748
## mass_media           -0.395435   1.037924 -0.3810 0.703213
## pre_mtr_brth         -0.069554   0.035759 -1.9451 0.051765
## brth_bf_18            0.269999   0.167239  1.6145 0.106429
## full_imunization     -0.090476   0.044893 -2.0154 0.043867
## cnt_mil_2             0.019597   0.055078  0.3558 0.721980
## mar_15                0.119358   0.357107  0.3342 0.738202
## hlth_ins_chd          0.083507   0.276575  0.3019 0.762704
## wmn_litr             -0.227266   0.086959 -2.6135 0.008963
## lag.mass_media        2.339437   2.366375  0.9886 0.322851
## lag.pre_mtr_brth      0.134568   0.090640  1.4846 0.137639
## lag.brth_bf_18       -0.482909   0.356692 -1.3539 0.175782
## lag.full_imunization  0.270064   0.106845  2.5276 0.011484
## lag.cnt_mil_2        -0.036031   0.150335 -0.2397 0.810585
## lag.mar_15            1.695953   0.864275  1.9623 0.049730
## lag.hlth_ins_chd     -0.373091   0.557646 -0.6690 0.503466
## lag.wmn_litr          0.124901   0.135474  0.9220 0.356554
## 
## Rho: 0.57298
## Approximate (numerical Hessian) standard error: 0.23831
##     z-value: 2.4044, p-value: 0.016201
## Lambda: -0.32199
## Approximate (numerical Hessian) standard error: 0.47565
##     z-value: -0.67695, p-value: 0.49844
## 
## LR test value: 35.998, p-value: 8.4256e-05
## 
## Log likelihood: -77.80885 for sacmixed model
## ML residual variance (sigma squared): 3.1625, (sigma: 1.7784)
## Number of observations: 38 
## Number of parameters estimated: 20 
## AIC: 195.62, (AIC for lm: 211.62)

The Manski model contains a long list of variables. Its not easy to interpret and one more thing we could see our spatial lag of Y and spatial error term both are insignificant. However AIC indicates that AIC for spatial model is low as compared to linear regression model. Therefore, we should continue with spatial models.

2.3.2 Spatial Autoregressive Combined Model (SARAR)

Typical form of SAC model is as follows. It includes spatial lag of Y (rho) and spatial error term.

SAC: Y=β_0+ρWY+Xβ+u and u=λWu+e

SAC_1<-sacsarlm(form, data=my_data2, listw=cont.listw)
summary(SAC_1)
## 
## Call:sacsarlm(formula = form, data = my_data2, listw = cont.listw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -5.03058 -1.03982 -0.13507  1.82063  4.54803 
## 
## Type: sac 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)      54.558900   9.440499  5.7792 7.504e-09
## mass_media       -0.596413   1.005177 -0.5933 0.5529529
## pre_mtr_brth     -0.076821   0.037591 -2.0436 0.0409930
## brth_bf_18        0.327994   0.136079  2.4103 0.0159387
## full_imunization -0.141483   0.043270 -3.2698 0.0010763
## cnt_mil_2         0.030935   0.061019  0.5070 0.6121687
## mar_15           -0.204499   0.339853 -0.6017 0.5473559
## hlth_ins_chd      0.070635   0.243309  0.2903 0.7715800
## wmn_litr         -0.245994   0.066597 -3.6938 0.0002209
## 
## Rho: -0.088531
## Asymptotic standard error: 0.21658
##     z-value: -0.40877, p-value: 0.68271
## Lambda: 0.89229
## Asymptotic standard error: 0.072113
##     z-value: 12.374, p-value: < 2.22e-16
## 
## LR test value: 13.441, p-value: 0.0012059
## 
## Log likelihood: -89.08712 for sac model
## ML residual variance (sigma squared): 4.7631, (sigma: 2.1825)
## Number of observations: 38 
## Number of parameters estimated: 12 
## AIC: 202.17, (AIC for lm: 211.62)

Estimates of SAC_1 model are bit convincing. We can see that spatial lag of Y is still insignificant. However, spatial error term is now significant as p-value is less then 0.05. Also, beta coefficients of full_immunization, brth_bf_18, pre_mtr_brth and wmn_litr are having a significant impact on our outcome variable i.e stunting rate. The AIC value for this spatial model is 202.17 in comparison to AIC value of lm model 211.62. This shows that our SAC_1 model is a better model. We should continue with spatial models.

2.3.3 Spatial Durbin Error Model (SDEM_1)

Includes spatial lags of X and spatial error term. The typical form is as follows:

SDEM: Y=β_0+Xβ+WXθ+u and u=λWu+e

SDEM_1<-errorsarlm(form, data=my_data, listw=cont.listw, etype="emixed") # with spatial lags of X
summary(SDEM_1)
## 
## Call:errorsarlm(formula = form, data = my_data, listw = cont.listw, 
##     etype = "emixed")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4.15869 -1.23798 -0.08817  1.09806  2.96930 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                         Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)           97.4814709  15.7074916  6.2060 5.433e-10
## mass_media            -0.6475494   2.2751044 -0.2846  0.775932
## pre_mtr_brth          -0.2529745   0.6538243 -0.3869  0.698819
## brth_bf_18            -0.1229563   0.2277018 -0.5400  0.589205
## full_imunization      -0.1559551   0.0580224 -2.6878  0.007191
## cnt_mil_2              0.0611510   0.0669147  0.9139  0.360788
## mar_15                -1.5409521   2.6377805 -0.5842  0.559096
## hlth_ins_chd           6.9585099   1.6290979  4.2714 1.943e-05
## wmn_litr              -0.4046491   0.0941590 -4.2975 1.727e-05
## lag.mass_media        13.8324383   6.1077906  2.2647  0.023530
## lag.pre_mtr_brth      -2.2745548   1.2091296 -1.8812  0.059951
## lag.brth_bf_18         0.0537485   0.5501822  0.0977  0.922177
## lag.full_imunization  -0.0020529   0.1453266 -0.0141  0.988730
## lag.cnt_mil_2          0.1822518   0.1585322  1.1496  0.250300
## lag.mar_15            -8.7735459   6.0584289 -1.4482  0.147574
## lag.hlth_ins_chd     -21.5543963   3.1147994 -6.9200 4.517e-12
## lag.wmn_litr          -0.1140421   0.1749549 -0.6518  0.514506
## 
## Lambda: -1.1953, LR test value: 2.0726, p-value: 0.14996
## Asymptotic standard error: 0.18477
##     z-value: -6.469, p-value: 9.8637e-11
## Wald statistic: 41.848, p-value: 9.8637e-11
## 
## Log likelihood: -82.66378 for error model
## ML residual variance (sigma squared): 3.3625, (sigma: 1.8337)
## Number of observations: 38 
## Number of parameters estimated: 19 
## AIC: 203.33, (AIC for lm: 203.4)

The estimates of SDEM_1 indicate that our spatial lag of X i.e lag.mass_media, lag.pre_mtr_brth, and lag.hlth_ins_chd are significant. Also beta coefficients of full_immunization, hlth_ins_chd and wmn_litr are significant now. Furthermore, the p-value of spatial error term is quite close to 10% . The AIC value for spatial model is low in comparison to linear model. So we can say SDEM_1 model results are acceptable as we have few significant variables.

2.3.4 Spatial Error Model(SEM)

Spatial error model does not contain spatial lags of X but it only contains spatial error term. The typical form of spatial error model is as follows:

SEM: Y=β_0+Xβ+u and u=λWu+e

SEM_1<-errorsarlm(form, data=my_data, listw=cont.listw) # no spat-lags of X
summary(SEM_1)
## 
## Call:errorsarlm(formula = form, data = my_data, listw = cont.listw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.4425 -1.1927 -0.2821  1.7860  4.8941 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)      55.476618   7.197232  7.7080 1.288e-14
## mass_media       -1.093482   1.686449 -0.6484 0.5167307
## pre_mtr_brth     -0.769011   0.513320 -1.4981 0.1341044
## brth_bf_18        0.428166   0.167539  2.5556 0.0105999
## full_imunization -0.151747   0.044910 -3.3789 0.0007278
## cnt_mil_2         0.015875   0.060677  0.2616 0.7936105
## mar_15           -2.901029   2.032040 -1.4276 0.1533945
## hlth_ins_chd      0.861392   1.664080  0.5176 0.6047102
## wmn_litr         -0.268250   0.061995 -4.3270 1.512e-05
## 
## Lambda: 0.85541, LR test value: 16.807, p-value: 4.1385e-05
## Asymptotic standard error: 0.07525
##     z-value: 11.368, p-value: < 2.22e-16
## Wald statistic: 129.22, p-value: < 2.22e-16
## 
## Log likelihood: -89.45699 for error model
## ML residual variance (sigma squared): 5.059, (sigma: 2.2492)
## Number of observations: 38 
## Number of parameters estimated: 11 
## AIC: 200.91, (AIC for lm: 215.72)

We could see that our beta coefficients of brh_bf_18, full_immunization and wmn_litr are significant. The spatial error term is also significant. And the AIC value for SEM_1 model is low as compared to lm model.The results of spatial error model are acceptable.

2.3.5 Spatial Durbin Model (SDM)

The spatial durbin model contains spatial lag of X and spatial lag of Y. The typical form of spatial durbin model is as follows:

SDM: Y=β_0+ρWY+Xβ+WXθ+e

SDM_1<-lagsarlm(form, data=my_data, listw=cont.listw, type="mixed") # with spatial lags of X
summary(SDM_1)
## 
## Call:lagsarlm(formula = form, data = my_data, listw = cont.listw, 
##     type = "mixed")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -5.55395 -1.22996  0.19515  1.71541  3.35362 
## 
## Type: mixed 
## Coefficients: (asymptotic standard errors) 
##                       Estimate Std. Error z value  Pr(>|z|)
## (Intercept)          33.310526  21.584892  1.5432 0.1227742
## mass_media            0.010398   2.231111  0.0047 0.9962815
## pre_mtr_brth         -0.728742   0.529693 -1.3758 0.1688893
## brth_bf_18            0.162090   0.194955  0.8314 0.4057344
## full_imunization     -0.121698   0.045806 -2.6568 0.0078883
## cnt_mil_2             0.034524   0.059406  0.5812 0.5611307
## mar_15               -2.007476   2.223456 -0.9029 0.3665986
## hlth_ins_chd          1.375801   1.620735  0.8489 0.3959508
## wmn_litr             -0.311815   0.083725 -3.7243 0.0001959
## lag.mass_media        5.111181   5.584825  0.9152 0.3600914
## lag.pre_mtr_brth      0.031171   1.399718  0.0223 0.9822331
## lag.brth_bf_18       -0.584074   0.442520 -1.3199 0.1868743
## lag.full_imunization  0.180150   0.128454  1.4024 0.1607818
## lag.cnt_mil_2         0.055336   0.170427  0.3247 0.7454140
## lag.mar_15            3.724421   5.662343  0.6578 0.5106971
## lag.hlth_ins_chd     -7.207205   3.614008 -1.9942 0.0461256
## lag.wmn_litr          0.062623   0.157455  0.3977 0.6908352
## 
## Rho: 0.47607, LR test value: 5.0998, p-value: 0.023929
## Asymptotic standard error: 0.16298
##     z-value: 2.921, p-value: 0.0034896
## Wald statistic: 8.532, p-value: 0.0034896
## 
## Log likelihood: -81.1502 for mixed model
## ML residual variance (sigma squared): 3.9648, (sigma: 1.9912)
## Number of observations: 38 
## Number of parameters estimated: 19 
## AIC: 200.3, (AIC for lm: 203.4)
## LM test for residual autocorrelation
## test value: 6.2545, p-value: 0.012388

The above results of spatial durbin model are convincing as our spatial lag of Y (rho) is significant which means that we cannot interpret beta coefficients but must calculate direct and indirect effects and interpret them. The spatial lag of lag.hlth_ins_chd is significant.Beta cofficients of full_immmunization and wmn_litr both are significant. The AIC value is also low in comparison to lm model.

2.3.6 Spatial Autoregressive Model (SAR)

It normally includes spatial lag of Y only (with rho coefficient). Typical form is as follows: SAR: Y=β_0+ρWY+Xβ+e

SAR_1<-lagsarlm(form, data=my_data, listw=cont.listw) # no spatial lags of X
summary(SAR_1)
## 
## Call:lagsarlm(formula = form, data = my_data, listw = cont.listw)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -7.23523 -1.28302  0.31187  1.63389  4.73284 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value  Pr(>|z|)
## (Intercept)      34.282853   9.308604  3.6829 0.0002306
## mass_media       -0.411414   2.193811 -0.1875 0.8512422
## pre_mtr_brth     -0.576823   0.564000 -1.0227 0.3064325
## brth_bf_18        0.412516   0.188286  2.1909 0.0284590
## full_imunization -0.106708   0.048841 -2.1848 0.0289022
## cnt_mil_2         0.018014   0.068422  0.2633 0.7923372
## mar_15           -2.030965   2.485041 -0.8173 0.4137707
## hlth_ins_chd     -1.743206   1.443994 -1.2072 0.2273505
## wmn_litr         -0.182014   0.075195 -2.4205 0.0154975
## 
## Rho: 0.48955, LR test value: 13.108, p-value: 0.00029408
## Asymptotic standard error: 0.11682
##     z-value: 4.1905, p-value: 2.7839e-05
## Wald statistic: 17.56, p-value: 2.7839e-05
## 
## Log likelihood: -91.30654 for lag model
## ML residual variance (sigma squared): 6.7421, (sigma: 2.5966)
## Number of observations: 38 
## Number of parameters estimated: 11 
## AIC: 204.61, (AIC for lm: 215.72)
## LM test for residual autocorrelation
## test value: 0.50926, p-value: 0.47546

Above results of SAR_1 model indicate that beta coefficients of brth_bf_18, full_imunization and wmn_litr are significant. It further indicates that spatial lag of Y (rho) is also significant which leads us to estimate direct and indirect impacts of our explanatory variables as we cannot interpret beta coefficients. The AIC value of SAR_1 model is also low as compared to AIC value of lm model.

2.3.7 Spatial Lag of X Model (SLX)

An ‘lm’ model augmented with the spatially lagged RHS variables only. The form of SLX model is as follows

SLX: Y=β_0+Xβ+WXθ+e

# from errorsarlm() library
# an ‘lm’ model augmented with the spatially lagged RHS variables
# RHS variables – right-hand side variables
SLX_1<-lmSLX(form, data=my_data, listw=cont.listw)
summary(SLX_1)
## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.058 -1.514  0.078  1.592  3.988 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           67.84708   28.96296   2.343  0.02908 * 
## mass_media             1.02982    3.27203   0.315  0.75607   
## pre_mtr_brth          -0.58868    0.78133  -0.753  0.45955   
## brth_bf_18             0.07322    0.28765   0.255  0.80155   
## full_imunization      -0.11176    0.06723  -1.662  0.11129   
## cnt_mil_2              0.05161    0.08780   0.588  0.56294   
## mar_15                -1.89569    3.28529  -0.577  0.57006   
## hlth_ins_chd           1.75854    2.38932   0.736  0.46987   
## wmn_litr              -0.35593    0.12349  -2.882  0.00891 **
## lag.mass_media         7.61547    8.20644   0.928  0.36396   
## lag.pre_mtr_brth      -0.63442    2.06658  -0.307  0.76187   
## lag.brth_bf_18        -0.24965    0.64203  -0.389  0.70131   
## lag.full_imunization   0.11489    0.18933   0.607  0.55047   
## lag.cnt_mil_2          0.07750    0.25206   0.307  0.76150   
## lag.mar_15             0.46448    8.36334   0.056  0.95624   
## lag.hlth_ins_chd     -11.81202    5.15184  -2.293  0.03229 * 
## lag.wmn_litr          -0.07832    0.22109  -0.354  0.72668   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.945 on 21 degrees of freedom
## Multiple R-squared:  0.9135, Adjusted R-squared:  0.8476 
## F-statistic: 13.86 on 16 and 21 DF,  p-value: 9.691e-08

The results of SLX_1 model suggest that not allot of variables have a significant impact on the dependent variable.

As we already mentioned that in the presence of spatially lagged variable WY, the parameters associated with explanatory variables are not interpreted as in the usual framework of linear models. This is because due to spatial interactions, the variation of an explanatory variable for a given zone directly effects its result and indirectly effects results of all other zones or regions. The estimated parameters are then used to calculate a multiplier effect that is global in that it affects whole of the sample.

In our case we have the significant spatially lagged of dependent variable so we cannot interpret beta coefficients of independent variables we need to calculate direct and indirect effects. Lets calculate direct and indirect effects of SAR_1 model. Following lines of code will calculate direct, indirect and total effects.

# distribution of total impact 
# vector of traces of powers of a spatial weights matrix
# converting censored listw to CsparseMatrix form
W.c<-as(as_dgRMatrix_listw(cont.listw), "CsparseMatrix") 
# the default values for the number of powers is 30
trMat<-trW(W.c, type="mult") 
SAR_1_imp<-impacts(SAR_1, tr=trMat, R=2000)
summary(SAR_1_imp, zstats=TRUE, short=TRUE)
## Impact measures (lag, trace):
##                       Direct    Indirect       Total
## mass_media       -0.43904158 -0.36693981 -0.80598139
## pre_mtr_brth     -0.61555927 -0.51446881 -1.13002808
## brth_bf_18        0.44021819  0.36792319  0.80814138
## full_imunization -0.11387358 -0.09517265 -0.20904623
## cnt_mil_2         0.01922374  0.01606671  0.03529045
## mar_15           -2.16735203 -1.81141781 -3.97876984
## hlth_ins_chd     -1.86026950 -1.55476603 -3.41503553
## wmn_litr         -0.19423643 -0.16233788 -0.35657430
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##                      Direct   Indirect     Total
## mass_media       2.37168487 2.62889223 4.8781138
## pre_mtr_brth     0.60039886 0.66346720 1.2135284
## brth_bf_18       0.20146241 0.29073479 0.4491040
## full_imunization 0.05172408 0.07600667 0.1172025
## cnt_mil_2        0.07431615 0.08250273 0.1526507
## mar_15           2.62266205 3.08692312 5.4751871
## hlth_ins_chd     1.55838299 1.82278685 3.2199029
## wmn_litr         0.07939355 0.10135072 0.1574974
## 
## Simulated z-values:
##                      Direct   Indirect      Total
## mass_media       -0.1798359 -0.1921733 -0.1909994
## pre_mtr_brth     -1.0312824 -0.8224187 -0.9598693
## brth_bf_18        2.2045216  1.4231074  1.9101923
## full_imunization -2.2180862 -1.4081678 -1.8920974
## cnt_mil_2         0.2653993  0.1979012  0.2361655
## mar_15           -0.8320972 -0.6737172 -0.7784251
## hlth_ins_chd     -1.1794097 -0.8926694 -1.0761561
## wmn_litr         -2.4564787 -1.6899115 -2.3257674
## 
## Simulated p-values:
##                  Direct   Indirect Total   
## mass_media       0.857281 0.847606 0.848526
## pre_mtr_brth     0.302408 0.410839 0.337121
## brth_bf_18       0.027488 0.154705 0.056108
## full_imunization 0.026549 0.159081 0.058478
## cnt_mil_2        0.790702 0.843122 0.813304
## mar_15           0.405354 0.500491 0.436318
## hlth_ins_chd     0.238235 0.372034 0.281857
## wmn_litr         0.014031 0.091045 0.020031

From above results we can see that total effects (direct + indirect) of brth_bf-18, full_immunization and wmn_litr are statistically significant. Direct effects mean that if brth_bf-18 increases by one percent stunting rate would increase by 44 percent in the same region. However, for indirect effects of brth_bf_18 increases by one percent then stunting rate would increase by 36 percent in all other regions. The total effect is approximately an increase in 80 percent.

2.4 Comparison of Models LR Test

# LR (likelihood ratio) test - compares nested restricted model
# H0 – restricted (narrower) model is better
# H1 – unrestricted (wider) model is better
# df in chi2 is the number of restricted parameters

LR.Sarlm(SDM_1, SAR_1)
## 
##  Likelihood ratio for spatial linear models
## 
## data:  
## Likelihood ratio = 20.313, df = 8, p-value = 0.009216
## sample estimates:
## Log likelihood of SDM_1 Log likelihood of SAR_1 
##               -81.15020               -91.30654

We can make a comparison among models whether restricted model is better or the unrestricted one. Log likelihood ratio test helps us in this regard.Keep in mind we cannot randomly prepare two spatial models we need to follow sequence of arrows that describes connections among the models. For the purpose of our study lets compare spatial durbin model and spatial auto regressive model. The output shows that unrestricted model is better in this case as p-value is low adn we reject the null hypothesis. Here we should opt durbin model.

2.5 Conclusion

The objective of this study was to observe spatial distribution and spillover effects of stunting rate at the district level. We estimated both linear regression and spatial models. The advantage of using spatial model is that it helps us in getting more precise results and reducing overbais in estimation. Furthermore, one can calculate spatial spillover effects.

Our results suggest that there exist spatial dependency in space among the stunting rates at the district level as proved by Moron statistics and scatter plot. Additionally, in few models the spatial lag of Y is significant suggesting that there exist spatial spillover effects. We also observed that spatial error term also bears significant relationship means some unobserved features such as people attitude towards change,risk or features related with religion and culture etc are spatially dependent and influence our dependent variable.Above all, results of spatial model are convincing and feasible in comparison to lm model. For future research, one could possible include data related to economic variables and try to estimate also the spatial temporal effects.