Spatial analysis of child mortality rate in Indian provinces

Introduction

Child mortality is one of those factors which, due to the issue of malnutrition, have been discussed quite extensively on a global level since the last decade or so. In the list of countries doing quite well in reducing the childhood mortality rates, India is placed very low. Even though the continuous monitoring and the policy making has helped in reducing the rates in the last decade, the issue still persists quite alarmingly. As a result, the focus always remain on analyzing and finding ways to reduce the impact of factors which lead to the under 5 year children deaths in India.

This study tries to analyze the relationship between various factors including availability of clean water, and smoking among others and the mortality rates. What this research also tries to do is to check if there is any spatial dependence and spatial correlations among Indian provinces when it comes to the under-5 child death rates. The study includes the use of various tools like Moran’s test, and the Join count test before building and fitting simple linear regression and spatial models to compare and check the impact of different variables involved.

Loading the necessary libraries/packages

library(ggplot2)
library(sf)
library(maps)
library(tidyverse)
library(readxl)
library(rgdal)
library(maptools)
library(sp)
library(RColorBrewer)
library(GISTools)
library(splm)
library(car)
library(lmtest)
library(Formula)
library(plm)
library(spdep)
library(spatialreg)
library(texreg)

Importing the data

The data has been collected from the National Family Health Survey (NFHS-5), 2019-21 report. This is a household based survey conducted with the support of International Institute for Population Sciences that aimed for providing data on province level indicators involving health, nutrition, and other socio-economic and demographic ones.

The data contains observations about 9 factors across the total 36 provinces and Union territories of India. A Union territory is an administrative division which, unlike the provinces, is governed completely or partially, by the Union Government of India. The 9 explanatory variables are as follows -

u5mortality - Early childhood (Under-5) mortality rate
antenatal_care - Maternal care indicators
fully_vaccinated - Vaccinations taken
violence - Spousal violence
smoker - Use of tobacco by the population age 15 and over
teenage_preg - Teenage pregnancy and motherhood
water - Access to improved source of drinking water (household %)
toilet_facility -
schooling_f - Level of schooling of more than 10 years for females (household %)

data <- read_excel("provinces.xlsx")

head(data)

## # A tibble: 6 x 11
##    ID_1 NAME_1       u5mortality antenatal_care fully_vaccinated violence smoker
##   <dbl> <chr>              <dbl>          <dbl>            <dbl>    <dbl>  <dbl>
## 1     1 Andaman and~        24.5           65.9             79.9     18.3   20.1
## 2     2 Andhra Prad~        35.2           46.8             73.2     33.4   29.1
## 3     3 Arunachal P~        18.8           14.4             64.9     26.6   70.6
## 4     4 Assam               39.1           26.9             66.7     34.3   57.2
## 5     5 Bihar               56.4            7.6             71       42.5   41  
## 6     6 Chandigarh          19.7           61.6             80.9     11.8   15.5
## # ... with 4 more variables: teenage_preg <dbl>, water <dbl>,
## #   toilet_facility <dbl>, schooling_f <dbl>

data1 <- data[-c(1,19),]          # dropping two island provinces

The two provinces named Andaman and Nicobar and Lakshwadeep are islands located in the Indian Ocean. The same have been dropped from the data as they do not share any land boundaries with other provinces, and this means including them will lead to unreliable results.

Importing the shape files

pro.sf <- st_read("IND_adm1.shp")

## Reading layer `IND_adm1' from data source `D:\UW\Summer semester 2021\Spatial\1\Project\IND_adm1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 36 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 68.18625 ymin: 6.754256 xmax: 97.41516 ymax: 35.50133
## Geodetic CRS:  WGS 84

head(pro.sf)

## Simple feature collection with 6 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 76.69063 ymin: 6.754256 xmax: 97.41516 ymax: 30.79904
## Geodetic CRS:  WGS 84
##   ID_0 ISO NAME_0 ID_1              NAME_1         TYPE_1       ENGTYPE_1
## 1  105 IND  India    1 Andaman and Nicobar Union Territor Union Territory
## 2  105 IND  India    2      Andhra Pradesh          State           State
## 3  105 IND  India    3   Arunachal Pradesh          State           State
## 4  105 IND  India    4               Assam          State           State
## 5  105 IND  India    5               Bihar          State           State
## 6  105 IND  India    6          Chandigarh Union Territor Union Territory
##   NL_NAME_1
## 1      <NA>
## 2      <NA>
## 3      <NA>
## 4      <NA>
## 5      <NA>
## 6      <NA>
##                                                                                              VARNAME_1
## 1 Andaman & Nicobar Islands|Andaman et Nicobar|Iihas de Andama e Nicobar|Inseln Andamanen und Nikobare
## 2                                                                                                 <NA>
## 3                       Agence de la Frontière du Nord-Est(French-obsolete)|North East Frontier Agency
## 4                                                                                                 <NA>
## 5                                                                                                 <NA>
## 6                                                                                                 <NA>
##                         geometry
## 1 MULTIPOLYGON (((93.78773 6....
## 2 MULTIPOLYGON (((80.27458 13...
## 3 MULTIPOLYGON (((96.15778 29...
## 4 MULTIPOLYGON (((89.87145 25...
## 5 MULTIPOLYGON (((88.10548 26...
## 6 MULTIPOLYGON (((76.80293 30...

pro.sf.final <- pro.sf[-c(1,19),]

Here again, the same observations have been removed for further analysis. As it can be seen, the data has been ordered as per the shapefile format.

Plotting Indian provinces

plot1 <- ggplot() + geom_sf(pro.sf.final, mapping=aes(geometry=geometry, fill=NAME_1))+
  labs(title= "Indian administrative units at the provincial level")

plot1

Spatial distribution of mortality rates

# Filtering out the desired data to visualize spatial distribution of Childhood mortality rate across provinces
distdata <- data1[, c("ID_1", "u5mortality")]
pro.sf.final <- merge(pro.sf.final, distdata, by= "ID_1")

# Plotting the distribution
plot2 <- ggplot() + geom_sf(pro.sf.final, mapping=aes(geometry=geometry, fill=u5mortality)) + 
  scale_fill_gradient(low = "#d3e7e7",  high = "#300bdd") +
  labs(title= "Spatial Distribution of Under 5 mortality rate in Indian provinces 2019-21")
plot2

As visible from the above plot, the provinces of Uttar Pradesh and Bihar are found to have maximum child mortality rates in the country, while Goa and Kerala with the lowest rates.

Checking the distribution of variables

par(mfrow = c(3,3))
hist(data1$u5mortality)
hist(data1$antenatal_care)
hist(data$fully_vaccinated)
hist(data$violence)
hist(data$smoker)
hist(data$teenage_preg)
hist(data$water)
hist(data$toilet_facility)
hist(data$schooling_f)

More variables seemingly tend toward normality which is a good thing but for this analysis, we are not going to be focusing too much on this aspect.

Building the contiguity matrix

Before calculating the contiguity matrix, the data must be converted into a Spatial object.

# Converting into spatial object
support.sp <- as(pro.sf.final, "Spatial")

# Generating the matrix
cont.nb <- poly2nb(as(support.sp, "SpatialPolygons"))

## although coordinates are longitude/latitude, st_intersects assumes that they are planar

cont.listw <- nb2listw(cont.nb,  style="W")

cont.listw

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 34 
## Number of nonzero links: 132 
## Percentage nonzero weights: 11.41869 
## Average number of links: 3.882353 
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 34 1156 34 20.30275 152.8701

As per the above output, the average number of links, which means average number of provinces a particular province shares border with, is roughly 3.88.

Row standardized matrix

india.mt <- nb2mat(cont.nb)
india.mt[1:5, 1:5]

##   [,1]      [,2] [,3] [,4] [,5]
## 1    0 0.0000000  0.0    0    0
## 2    0 0.0000000  0.5    0    0
## 3    0 0.1428571  0.0    0    0
## 4    0 0.0000000  0.0    0    0
## 5    0 0.0000000  0.0    0    0

The above matrix contains the assigned weights to every province depending on the neighborhood structure.

Simple regression model (OLS)

lm.model <- lm(u5mortality~antenatal_care+fully_vaccinated+violence+
                 smoker+teenage_preg+water+toilet_facility+schooling_f, data=data1)
summary(lm.model)

## 
## Call:
## lm(formula = u5mortality ~ antenatal_care + fully_vaccinated + 
##     violence + smoker + teenage_preg + water + toilet_facility + 
##     schooling_f, data = data1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1427  -4.0200  -0.7727   5.8895  14.1200 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      112.638365  39.999826   2.816  0.00935 **
## antenatal_care    -0.059518   0.154334  -0.386  0.70302   
## fully_vaccinated  -0.485757   0.276702  -1.756  0.09142 . 
## violence          -0.208686   0.189276  -1.103  0.28072   
## smoker            -0.006264   0.120097  -0.052  0.95882   
## teenage_preg      -0.656152   0.454916  -1.442  0.16161   
## water              0.440932   0.357495   1.233  0.22890   
## toilet_facility   -0.704111   0.253731  -2.775  0.01029 * 
## schooling_f       -0.647720   0.323979  -1.999  0.05656 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.477 on 25 degrees of freedom
## Multiple R-squared:  0.7011, Adjusted R-squared:  0.6054 
## F-statistic: 7.329 on 8 and 25 DF,  p-value: 5.336e-05

From the output of the model above, it can be seen that only one variable, i.e., toilet_facility is found to be statistically significant at the level of 5%. However, the reason behind building this model is not to check the significant variables but to check if there is a need for spatial models for the data we are trying to analyze or not.

Checking for the spatial correlation

res <- lm.model$residuals
lm.morantest(lm.model, cont.listw)           # Moran test

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = u5mortality ~ antenatal_care + fully_vaccinated +
## violence + smoker + teenage_preg + water + toilet_facility +
## schooling_f, data = data1)
## weights: cont.listw
## 
## Moran I statistic standard deviate = 3.0391, p-value = 0.001186
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##       0.22648928      -0.09650817       0.01129536

moran.test(res, cont.listw) # Justification for spatial models

## 
##  Moran I test under randomisation
## 
## data:  res  
## weights: cont.listw    
## 
## Moran I statistic standard deviate = 2.0719, p-value = 0.01914
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##        0.22648928       -0.03030303        0.01536126

The test used above to check for the existence of spatial correlation is the Moran’s test. The null hypothesis (H0) for the test states that there is spatial randomness (no correlation), while the alternative hypothesis justifies existence of spatial correlation.

As per the results of the test in this case, p-value (~0.019) is very less than the 5% level of significance, which implies we reject the null hypothesis. This means social correlation exists and there is a need for spatial analysis for the problem.

Moran scatter plot

x <- data1$u5mortality # variable selection
zx <- as.data.frame(scale(x))  #standardization of variable

moran.plot(zx$V1, cont.listw, pch=19, labels=as.character(data1$NAME_1))

Plotting the moran’s graph makes it clear that the provinces with high Under-5 mortality rates are situated next to each other and the low ones with low. The figure also clearly depicts the result from plot2, i.e, Bihar and Uttar pradesh, two neighbouring provinces have the maximum mortality rates.

Mapping the results from moran’s plot

wzx <- lag.listw(cont.listw, zx$V1)     # spatial lag of x
morlm <- lm(wzx~zx$V1)                  # linear regression            
summary(morlm)

## 
## Call:
## lm(formula = wzx ~ zx$V1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.91229 -0.25113  0.06669  0.32540  1.13523 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.17344    0.07820   2.218   0.0338 *  
## zx$V1        0.38173    0.07938   4.809 3.46e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.456 on 32 degrees of freedom
## Multiple R-squared:  0.4195, Adjusted R-squared:  0.4014 
## F-statistic: 23.13 on 1 and 32 DF,  p-value: 3.458e-05

slope <- morlm$coefficients[2]          # coefficient from regression
intercept <- morlm$coefficients[1]      # constant term in regression

# Let's make the conditions for the distribution
cond1 <- ifelse(zx>=0 & wzx>=0, 1,0)    # I quarter
cond2 <- ifelse(zx>=0 & wzx<0, 2,0)     # II quarter
cond3 <- ifelse(zx<0 & wzx<0, 3,0)      # III quarter
cond4 <- ifelse(zx<0 & wzx>=0, 4,0)     # IV quarter
cond.all <- cond1+cond2+cond3+cond4     # combining all quarters in one

cond <- as.data.frame(cond.all)
cond

##    V1
## 1   2
## 2   4
## 3   2
## 4   1
## 5   4
## 6   1
## 7   1
## 8   1
## 9   4
## 10  3
## 11  1
## 12  1
## 13  4
## 14  3
## 15  1
## 16  3
## 17  3
## 18  1
## 19  4
## 20  3
## 21  1
## 22  4
## 23  2
## 24  1
## 25  3
## 26  2
## 27  1
## 28  3
## 29  3
## 30  4
## 31  2
## 32  1
## 33  1
## 34  4

brks <- c(1,2,3,4)
cols <- c("#003D30", "#006650", "#00F5C0", "#00B890")

plot(support.sp, col=cols[findInterval(cond$V1, brks)])
legend("bottomleft", legend=c("HH – high surrounded by high", 
                               "LH - low surrounded by high", 
                               "LL - low surrounded by low", 
                               "HL - high surrounded by low"), 
       fill=cols, bty="n", cex=0.80)
title(main="Mapping of Moranscatterplot results
Child mortality rate in 2019-21")

Join Count test

The aim to perform this test is to categorize the u5mortality variable into different groups (clusters) as per the severeness of mortality rates, and afterwards plot the map of India with the assigned colors.

# Defining range to assign clusters

fctr <- factor(cut(data1$u5mortality, breaks=c(0,20,40,60), 
                       labels=c("low", "medium", "high")))

# Graphical parameters

brks1<-c(0,20,40,60) 
cols<-c("green", "blue", "red")

# Plotting the scatter plot

plot(data1$u5mortality, bg=cols[findInterval(data1$u5mortality, brks1)], pch=21, ylab="Child Mortality Rate")
abline(h=c(20,40,60), lty=3)

The above chart shows the distribution of the variable “u5mortality” assigned to 3 different colors as per the range they belong to.

# Spatial distribution

plot(support.sp, col=cols[findInterval(data1$u5mortality, brks1)])
title(main="Child mortality in 2019-21")
legend("bottomleft", legend=c("low", "medium", "high"), leglabs(brks1), fill=cols, bty="n")

# Join-count test (based on colors)

joincount.test(fctr, cont.listw)

## 
##  Join count test under nonfree sampling
## 
## data:  fctr 
## weights: cont.listw 
## 
## Std. deviate for low = -0.80333, p-value = 0.7891
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##             0.3333333             0.6363636             0.1422936 
## 
## 
##  Join count test under nonfree sampling
## 
## data:  fctr 
## weights: cont.listw 
## 
## Std. deviate for medium = 1.5111, p-value = 0.06538
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##             6.3761905             5.1818182             0.6247368 
## 
## 
##  Join count test under nonfree sampling
## 
## data:  fctr 
## weights: cont.listw 
## 
## Std. deviate for high = 2.6843, p-value = 0.003634
## alternative hypothesis: greater
## sample estimates:
## Same colour statistic           Expectation              Variance 
##             1.9944444             0.8484848             0.1822492

The output of the join count test clearly shows that there is a significant association among provinces with high child mortality rates, i.e., similar colors locate next to each other more often than randomly. The result also depicts that there is no serious autocorrelation for the low and medium mortality rate provinces.

Building spatial models

# Storing the formula in a variable for further use
fm <- u5mortality~antenatal_care+fully_vaccinated+violence+smoker+teenage_preg+
  water+toilet_facility+schooling_f

Manski Model (full specification)

GNS <- sacsarlm(fm, data=data1, listw=cont.listw, type="sacmixed", method="LU")
summary(GNS)

## 
## Call:
## sacsarlm(formula = fm, data = data1, listw = cont.listw, type = "sacmixed", 
##     method = "LU")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -7.56639 -3.67619  0.12385  2.89738 11.05194 
## 
## Type: sacmixed 
## Coefficients: (numerical Hessian approximate standard errors) 
##                       Estimate Std. Error z value  Pr(>|z|)
## (Intercept)          72.781788  60.205173  1.2089 0.2267028
## antenatal_care        0.059609   0.113310  0.5261 0.5988398
## fully_vaccinated     -0.097437   0.213602 -0.4562 0.6482724
## violence              0.037910   0.150154  0.2525 0.8006751
## smoker               -0.129135   0.101353 -1.2741 0.2026241
## teenage_preg         -0.603746   0.317851 -1.8995 0.0575038
## water                -0.019448   0.241549 -0.0805 0.9358270
## toilet_facility      -0.317909   0.223998 -1.4193 0.1558251
## schooling_f          -1.101855   0.282367 -3.9022 9.532e-05
## lag.antenatal_care    0.216314   0.216797  0.9978 0.3183890
## lag.fully_vaccinated -0.181182   0.461054 -0.3930 0.6943386
## lag.violence          0.582396   0.309235  1.8833 0.0596537
## lag.smoker            0.666563   0.177634  3.7525 0.0001751
## lag.teenage_preg     -1.771946   0.660936 -2.6810 0.0073410
## lag.water             0.037342   0.688233  0.0543 0.9567300
## lag.toilet_facility  -0.710452   0.575705 -1.2341 0.2171824
## lag.schooling_f       1.381878   0.683373  2.0221 0.0431615
## 
## Rho: 0.39459
## Approximate (numerical Hessian) standard error: 0.29832
##     z-value: 1.3227, p-value: 0.18593
## Lambda: -0.12965
## Approximate (numerical Hessian) standard error: 0.43323
##     z-value: -0.29927, p-value: 0.76474
## 
## LR test value: 27.301, p-value: 0.0023331
## 
## Log likelihood: -102.0364 for sacmixed model
## ML residual variance (sigma squared): 22.612, (sigma: 4.7552)
## Number of observations: 34 
## Number of parameters estimated: 20 
## AIC: 244.07, (AIC for lm: 251.37)

As we can see, the list of variables is quite long, however, the number of significant variables is only 4 (including 3 variables with lags). What is more noteworthy is that the AIC value for the model (244.07) is lower than the AIC value for simple regression (251.37). Clearly, there is a need to build some other models to compare which one would be the best.

SAC (or SARAR)

SAC <- sacsarlm(fm, data=data1, listw=cont.listw)
summary(SAC)

## 
## Call:sacsarlm(formula = fm, data = data1, listw = cont.listw)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12.92930  -4.36219   0.48047   3.56741  14.75252 
## 
## Type: sac 
## Coefficients: (asymptotic standard errors) 
##                    Estimate Std. Error z value  Pr(>|z|)
## (Intercept)      124.100751  31.881864  3.8925 9.921e-05
## antenatal_care     0.012265   0.117044  0.1048   0.91654
## fully_vaccinated  -0.404578   0.203340 -1.9897   0.04663
## violence          -0.169598   0.155588 -1.0900   0.27569
## smoker            -0.184347   0.090260 -2.0424   0.04111
## teenage_preg      -0.305141   0.329804 -0.9252   0.35485
## water              0.030243   0.255520  0.1184   0.90578
## toilet_facility   -0.273600   0.220361 -1.2416   0.21438
## schooling_f       -1.164280   0.289722 -4.0186 5.854e-05
## 
## Rho: 0.15411
## Asymptotic standard error: 0.22978
##     z-value: 0.67068, p-value: 0.50243
## Lambda: 0.63364
## Asymptotic standard error: 0.19174
##     z-value: 3.3047, p-value: 0.00095065
## 
## LR test value: 9.1395, p-value: 0.01036
## 
## Log likelihood: -111.1173 for sac model
## ML residual variance (sigma squared): 35.481, (sigma: 5.9566)
## Number of observations: 34 
## Number of parameters estimated: 12 
## AIC: 246.23, (AIC for lm: 251.37)

Clearly, the results have improved as out of the 8 variables now, 3 are found to be statistically significant at the 5% level. Also, the AIC value, even though slightly, has increased to 246.23, however still remains lower than that of OLS model.

SDEM (Spatial Durbin Error Model)

SDEM <- errorsarlm(fm, data=data1, listw=cont.listw, etype="emixed") # with spatial lags of X
summary(SDEM)

## 
## Call:
## errorsarlm(formula = fm, data = data1, listw = cont.listw, etype = "emixed")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -7.61232 -3.71294  0.04674  2.70920 11.49521 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                       Estimate Std. Error z value  Pr(>|z|)
## (Intercept)          96.336411  60.678420  1.5877 0.1123643
## antenatal_care        0.035512   0.112210  0.3165 0.7516409
## fully_vaccinated     -0.105656   0.217092 -0.4867 0.6264810
## violence              0.021886   0.139462  0.1569 0.8753011
## smoker               -0.087851   0.097598 -0.9001 0.3680542
## teenage_preg         -0.687226   0.315581 -2.1777 0.0294317
## water                 0.045328   0.243390  0.1862 0.8522592
## toilet_facility      -0.366735   0.222687 -1.6469 0.0995858
## schooling_f          -1.036141   0.280821 -3.6897 0.0002245
## lag.antenatal_care    0.160547   0.221742  0.7240 0.4690510
## lag.fully_vaccinated -0.312470   0.438620 -0.7124 0.4762219
## lag.violence          0.546807   0.313831  1.7424 0.0814446
## lag.smoker            0.608880   0.157338  3.8699 0.0001089
## lag.teenage_preg     -1.979808   0.633488 -3.1253 0.0017765
## lag.water             0.276374   0.659731  0.4189 0.6752751
## lag.toilet_facility  -0.804444   0.546004 -1.4733 0.1406623
## lag.schooling_f       0.956287   0.564725  1.6934 0.0903855
## 
## Lambda: 0.24118, LR test value: 0.59478, p-value: 0.44058
## Asymptotic standard error: 0.20216
##     z-value: 1.193, p-value: 0.23285
## Wald statistic: 1.4233, p-value: 0.23285
## 
## Log likelihood: -102.7502 for error model
## ML residual variance (sigma squared): 24.324, (sigma: 4.932)
## Number of observations: 34 
## Number of parameters estimated: 19 
## AIC: 243.5, (AIC for lm: 242.1)

As it can be seen from the output, SDEM model performed almost at par with the Manski model with same number of variables being significant. However, a major difference in both these models is the AIC value as the AIC value here in case of SDEM is higher than that of OLS.

SEM (Spatial Error Model)

SEM <- errorsarlm(fm, data=data1, listw=cont.listw) # no spat-lags of X
summary(SEM)

## 
## Call:errorsarlm(formula = fm, data = data1, listw = cont.listw)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -12.14558  -4.00544   0.48186   2.84130  14.82028 
## 
## Type: error 
## Coefficients: (asymptotic standard errors) 
##                    Estimate Std. Error z value  Pr(>|z|)
## (Intercept)      130.344610  29.957400  4.3510 1.355e-05
## antenatal_care     0.031948   0.114021  0.2802   0.77933
## fully_vaccinated  -0.420508   0.200618 -2.0961   0.03608
## violence          -0.181425   0.154466 -1.1745   0.24018
## smoker            -0.171460   0.088772 -1.9315   0.05342
## teenage_preg      -0.308322   0.325313 -0.9478   0.34325
## water              0.041718   0.253890  0.1643   0.86948
## toilet_facility   -0.288047   0.221945 -1.2978   0.19435
## schooling_f       -1.193533   0.286870 -4.1605 3.175e-05
## 
## Lambda: 0.69151, LR test value: 8.5755, p-value: 0.0034071
## Asymptotic standard error: 0.11808
##     z-value: 5.8565, p-value: 4.7266e-09
## Wald statistic: 34.299, p-value: 4.7266e-09
## 
## Log likelihood: -111.3993 for error model
## ML residual variance (sigma squared): 35.191, (sigma: 5.9322)
## Number of observations: 34 
## Number of parameters estimated: 11 
## AIC: 244.8, (AIC for lm: 251.37)

Comparing the results of first three models with the output of SEM model, it can be said that the performance of SEM model lies somewhere between that of SDEM and SARAR, with SARAR being the best one so far.

SDM (Spatial Durbin Model)

SDM <- lagsarlm(fm, data=data1, listw=cont.listw, type="mixed") # with spatial lags of X
summary(SDM)

## 
## Call:lagsarlm(formula = fm, data = data1, listw = cont.listw, type = "mixed")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -7.67417 -3.64573  0.28149  2.89565 11.31768 
## 
## Type: mixed 
## Coefficients: (asymptotic standard errors) 
##                       Estimate Std. Error z value  Pr(>|z|)
## (Intercept)          77.684177  59.955121  1.2957  0.195077
## antenatal_care        0.051327   0.109850  0.4672  0.640326
## fully_vaccinated     -0.101675   0.211951 -0.4797  0.631435
## violence              0.026531   0.141176  0.1879  0.850932
## smoker               -0.120552   0.096178 -1.2534  0.210050
## teenage_preg         -0.617841   0.311698 -1.9822  0.047459
## water                -0.006221   0.240792 -0.0258  0.979389
## toilet_facility      -0.331651   0.220116 -1.5067  0.131885
## schooling_f          -1.086144   0.277955 -3.9076 9.321e-05
## lag.antenatal_care    0.209680   0.214101  0.9794  0.327407
## lag.fully_vaccinated -0.198294   0.443905 -0.4467  0.655089
## lag.violence          0.573016   0.310340  1.8464  0.064832
## lag.smoker            0.643151   0.156912  4.0988 4.153e-05
## lag.teenage_preg     -1.773865   0.623774 -2.8438  0.004458
## lag.water             0.061340   0.655918  0.0935  0.925492
## lag.toilet_facility  -0.699743   0.556663 -1.2570  0.208742
## lag.schooling_f       1.279310   0.600272  2.1312  0.033071
## 
## Rho: 0.33165, LR test value: 1.933, p-value: 0.16443
## Asymptotic standard error: 0.18723
##     z-value: 1.7714, p-value: 0.0765
## Wald statistic: 3.1377, p-value: 0.0765
## 
## Log likelihood: -102.0811 for mixed model
## ML residual variance (sigma squared): 23.06, (sigma: 4.8021)
## Number of observations: 34 
## Number of parameters estimated: 19 
## AIC: 242.16, (AIC for lm: 242.1)
## LM test for residual autocorrelation
## test value: 0.37861, p-value: 0.53835

The results here are a bit interesting because the number of significant variables has increased to 5 and also the AIC value is also very slightly higher than that of OLS. Moreover, the result of LM test shows that there is no serial correlation between residuals, which is a good thing. Surely, it seems that SDM is one of the better models.

SAR (Spatial Lag Model)

SAR <- lagsarlm(fm, data=data1, listw=cont.listw) # no spatial lags of X
summary(SAR)

## 
## Call:lagsarlm(formula = fm, data = data1, listw = cont.listw)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -17.606197  -4.465981  -0.030338   5.021473  12.333193 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                   Estimate Std. Error z value Pr(>|z|)
## (Intercept)      89.826870  33.908199  2.6491  0.00807
## antenatal_care   -0.086166   0.122574 -0.7030  0.48208
## fully_vaccinated -0.348909   0.223514 -1.5610  0.11852
## violence         -0.106386   0.153649 -0.6924  0.48869
## smoker           -0.095516   0.095325 -1.0020  0.31635
## teenage_preg     -0.438030   0.361254 -1.2125  0.22531
## water             0.217527   0.282401  0.7703  0.44114
## toilet_facility  -0.422860   0.209928 -2.0143  0.04398
## schooling_f      -0.672705   0.261893 -2.5686  0.01021
## 
## Rho: 0.40502, LR test value: 4.5127, p-value: 0.033644
## Asymptotic standard error: 0.14686
##     z-value: 2.7578, p-value: 0.005819
## Wald statistic: 7.6055, p-value: 0.005819
## 
## Log likelihood: -113.4307 for lag model
## ML residual variance (sigma squared): 44.268, (sigma: 6.6534)
## Number of observations: 34 
## Number of parameters estimated: 11 
## AIC: 248.86, (AIC for lm: 251.37)
## LM test for residual autocorrelation
## test value: 3.1098, p-value: 0.077825

The results seems to be alright but not as good as the results of SDM and SARAR models. So, let’s move onto the last spatial model for this analysis which is the Spatial Lag of X Model (SLX).

SLX (Spatial Lag of X)

SLX <- lmSLX(fm, data=data1, listw=cont.listw)
summary(SLX)

## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.1619 -3.5336 -0.0098  2.9391 11.0956 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)  
## (Intercept)          84.80913   86.72357   0.978   0.3418  
## antenatal_care        0.05057    0.16212   0.312   0.7589  
## fully_vaccinated     -0.08678    0.31254  -0.278   0.7846  
## violence              0.04832    0.20838   0.232   0.8194  
## smoker               -0.08980    0.14029  -0.640   0.5306  
## teenage_preg         -0.68040    0.45745  -1.487   0.1552  
## water                 0.04208    0.35397   0.119   0.9068  
## toilet_facility      -0.34256    0.32467  -1.055   0.3061  
## schooling_f          -1.06076    0.40998  -2.587   0.0192 *
## lag.antenatal_care    0.13510    0.31278   0.432   0.6712  
## lag.fully_vaccinated -0.31839    0.64856  -0.491   0.6298  
## lag.violence          0.57073    0.45387   1.257   0.2256  
## lag.smoker            0.66177    0.23017   2.875   0.0105 *
## lag.teenage_preg     -2.10728    0.91323  -2.308   0.0339 *
## lag.water             0.37830    0.95008   0.398   0.6955  
## lag.toilet_facility  -0.89070    0.81427  -1.094   0.2893  
## lag.schooling_f       1.09749    0.85120   1.289   0.2145  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.088 on 17 degrees of freedom
## Multiple R-squared:  0.8579, Adjusted R-squared:  0.7241 
## F-statistic: 6.413 on 16 and 17 DF,  p-value: 0.0002101

The output above clearly shows that the SLX model has performed fairly enough.

However, as there is not a single standout model, we will have to finalize the model on the basis of various things like AIC, variables and error significance.

Summary of models

screenreg(list(GNS, SAC, SDEM, SEM, SDM, SAR, SLX), custom.model.names=c("GNS", "SAC", "SDEM", "SEM", "SDM", "SAR", "SLX"))

## 
## ============================================================================================================
##                       GNS          SAC          SDEM         SEM          SDM          SAR         SLX      
## ------------------------------------------------------------------------------------------------------------
## (Intercept)             72.78       124.10 ***    96.34       130.34 ***    77.68        89.83 **    84.81  
##                        (60.21)      (31.88)      (60.68)      (29.96)      (59.96)      (33.91)     (86.72) 
## antenatal_care           0.06         0.01         0.04         0.03         0.05        -0.09        0.05  
##                         (0.11)       (0.12)       (0.11)       (0.11)       (0.11)       (0.12)      (0.16) 
## fully_vaccinated        -0.10        -0.40 *      -0.11        -0.42 *      -0.10        -0.35       -0.09  
##                         (0.21)       (0.20)       (0.22)       (0.20)       (0.21)       (0.22)      (0.31) 
## violence                 0.04        -0.17         0.02        -0.18         0.03        -0.11        0.05  
##                         (0.15)       (0.16)       (0.14)       (0.15)       (0.14)       (0.15)      (0.21) 
## smoker                  -0.13        -0.18 *      -0.09        -0.17        -0.12        -0.10       -0.09  
##                         (0.10)       (0.09)       (0.10)       (0.09)       (0.10)       (0.10)      (0.14) 
## teenage_preg            -0.60        -0.31        -0.69 *      -0.31        -0.62 *      -0.44       -0.68  
##                         (0.32)       (0.33)       (0.32)       (0.33)       (0.31)       (0.36)      (0.46) 
## water                   -0.02         0.03         0.05         0.04        -0.01         0.22        0.04  
##                         (0.24)       (0.26)       (0.24)       (0.25)       (0.24)       (0.28)      (0.35) 
## toilet_facility         -0.32        -0.27        -0.37        -0.29        -0.33        -0.42 *     -0.34  
##                         (0.22)       (0.22)       (0.22)       (0.22)       (0.22)       (0.21)      (0.32) 
## schooling_f             -1.10 ***    -1.16 ***    -1.04 ***    -1.19 ***    -1.09 ***    -0.67 *     -1.06 *
##                         (0.28)       (0.29)       (0.28)       (0.29)       (0.28)       (0.26)      (0.41) 
## lag.antenatal_care       0.22                      0.16                      0.21                     0.14  
##                         (0.22)                    (0.22)                    (0.21)                   (0.31) 
## lag.fully_vaccinated    -0.18                     -0.31                     -0.20                    -0.32  
##                         (0.46)                    (0.44)                    (0.44)                   (0.65) 
## lag.violence             0.58                      0.55                      0.57                     0.57  
##                         (0.31)                    (0.31)                    (0.31)                   (0.45) 
## lag.smoker               0.67 ***                  0.61 ***                  0.64 ***                 0.66 *
##                         (0.18)                    (0.16)                    (0.16)                   (0.23) 
## lag.teenage_preg        -1.77 **                  -1.98 **                  -1.77 **                 -2.11 *
##                         (0.66)                    (0.63)                    (0.62)                   (0.91) 
## lag.water                0.04                      0.28                      0.06                     0.38  
##                         (0.69)                    (0.66)                    (0.66)                   (0.95) 
## lag.toilet_facility     -0.71                     -0.80                     -0.70                    -0.89  
##                         (0.58)                    (0.55)                    (0.56)                   (0.81) 
## lag.schooling_f          1.38 *                    0.96                      1.28 *                   1.10  
##                         (0.68)                    (0.56)                    (0.60)                   (0.85) 
## rho                      0.39         0.15                                   0.33         0.41 **           
##                         (0.30)       (0.23)                                 (0.19)       (0.15)             
## lambda                  -0.13         0.63 ***     0.24         0.69 ***                                    
##                         (0.43)       (0.19)       (0.20)       (0.12)                                       
## ------------------------------------------------------------------------------------------------------------
## Num. obs.               34           34           34           34           34           34                 
## Parameters              20           12           19           11           19           11                 
## Log Likelihood        -102.04      -111.12      -102.75      -111.40      -102.08      -113.43     -103.05  
## AIC (Linear model)     251.37       251.37       242.10       251.37       242.10       251.37              
## AIC (Spatial model)    244.07       246.23       243.50       244.80       242.16       248.86              
## LR test: statistic      27.30         9.14         0.59         8.58         1.93         4.51              
## LR test: p-value         0.00         0.01         0.44         0.00         0.16         0.03              
## R^2                                                                                                   0.86  
## Adj. R^2                                                                                              0.72  
## Sigma                                                                                                 7.09  
## Statistic                                                                                             6.41  
## P Value                                                                                               0.00  
## DF                                                                                                   16.00  
## AIC                                                                                                 242.10  
## BIC                                                                                                 269.57  
## Deviance                                                                                            854.15  
## DF Resid.                                                                                            17     
## nobs                                                                                                 34     
## ============================================================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05

Looking at the output and comparing the results, it can be said that some of the spatially lagged variables are significant. This means it would be better if the lagged variables are taken into account for the spatial analysis of the problem. As per the above results, SDM and SDEM are seemingly the best models.

Conclusion

The objective of the research was to analyze the problem regarding childhood mortality and check if there is a scope of using spatial analysis methods. As we found out by the Moran’s test that there was indeed existence of spatial correlation, which meant spatial models would be better for more reliable and comprehensive analysis.

Moreover, the provinces on the central-eastern part of the country, especially Uttar Pradesh and Bihar, were found to be the most affected ones with the highest childhood mortality rates. As per the two best selected models, out of all the variables, teenage pregnancy rates of the provinces, schooling levels for females, lagged variable of smoker, among others were found to be statistically significant.