Introduction

In developing countries, nutritional deficiency is the main cause of under- five mortality. Around, 6.3 million children of under five years of age die every year. In Pakistan, situation is even more alarming roughly about 38 percent of children under five years of age are stunted. Incomplete or no immunization puts the children at risk of infection from diseases like TB, measles, polio and other vaccine preventable diseases. In 2008, it was estimated that 17 percent of all the deaths among 0-59 months of children were vaccine preventable. Maintaining an improved child health status becomes a key challenge for majority of developing and lower middle income countries as it plays an important role in the social and capital growth of the country (UNICEF, 2019).

There exist an abundant literature on examining the link between child hood stunting and other predictors with the help of using statistical and econometric modeling. However, the purpose of this project is to employ advance visualize tools (GGplot) to gather and present insights in a visual manner rather then using statistical techniques. Lets first load the data before diving further.

data <- read_excel("data.xlsx")
names(data)
## [1] "Districts"    "Region"       "ID_3"         "Mass_Media"   "Immunization"
## [6] "Stunting"     "Marriage_18"  "Wmn_Literacy" "Climate_Vun"

Data Sources and Description

Data related to all variables have been retrieved from Multiple Indicator Cluster Survey (MICS, 2017-18).Its a household survey that aimed at providing district level estimates on population, health, nutrition and other demographic and socioeconomic indicators.Selection of variables was done by keeping in view the previous studies and there importance with the outcome variable i.e stunting. The shape files to create maps were obtained from DIVA-GIS. All districts of province Punjab are a part of this study.Following are the variables which have been used in this study along with their description.

Districts: Districts of province punjab

Region: Districts are categorized into South, North and Central Punjab region

Mass_Media: Women exposure to mass media

Immunization: Percentage of children who received immunization

Stunting: Child Hood Stunting rate

Marriage_15: Marriage before age 15

Wmn_Literacy: Women Literacy

Climate_Vun: Climate vulnerability, Regions and Districts are categorized into low, medium and high levels based on their exposure to climate change.

Lets have a look at our data.

glimpse(data)
## Rows: 38
## Columns: 9
## $ Districts    <chr> "Bahawalnagar", "Bahawalpur", "Rahimyar Khan", "Dera Ghaz~
## $ Region       <chr> "South", "South", "South", "South", "South", "South", "So~
## $ ID_3         <dbl> 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 9~
## $ Mass_Media   <dbl> 0.5, 1.0, 0.3, 0.2, 0.3, 0.9, 1.0, 1.6, 0.3, 1.0, 1.2, 1.~
## $ Immunization <dbl> 60.0, 61.2, 62.3, 44.1, 72.9, 65.1, 47.0, 59.8, 61.8, 67.~
## $ Stunting     <dbl> 36.8, 39.4, 46.1, 46.4, 29.6, 39.2, 47.4, 28.8, 35.1, 29.~
## $ Marriage_18  <dbl> 5.5, 3.7, 6.4, 8.4, 3.5, 8.8, 12.1, 2.5, 6.7, 3.3, 2.1, 3~
## $ Wmn_Literacy <dbl> 46.4, 47.1, 41.3, 35.3, 45.9, 33.1, 26.1, 66.6, 41.3, 63.~
## $ Climate_Vun  <chr> "medium", "medium", "high", "high", "high", "high", "high~

As we can see we have both charaters and double. Lets convert characters to factors.

data$Districts <- as.factor(data$Districts)
data$floods_cat <- as.factor(data$Climate_Vun)
data$Region <- as.factor(data$Region)
head(data)
## # A tibble: 6 x 10
##   Districts      Region  ID_3 Mass_Media Immunization Stunting Marriage_18
##   <fct>          <fct>  <dbl>      <dbl>        <dbl>    <dbl>       <dbl>
## 1 Bahawalnagar   South     77        0.5         60       36.8         5.5
## 2 Bahawalpur     South     78        1           61.2     39.4         3.7
## 3 Rahimyar Khan  South     79        0.3         62.3     46.1         6.4
## 4 Dera Ghazi Kha South     80        0.2         44.1     46.4         8.4
## 5 Layyah         South     81        0.3         72.9     29.6         3.5
## 6 Muzaffargarh   South     82        0.9         65.1     39.2         8.8
## # ... with 3 more variables: Wmn_Literacy <dbl>, Climate_Vun <chr>,
## #   floods_cat <fct>

Visualization Part

The first step in any study is to visualize distributions of variables. It informs us what sort of a data we are dealing with either we have normally distributed data or left and rightly skewed. It also helps us to identify relationships among variables.Lets first start with visualizing distributions of variables.

par(mfrow=c(2,3))
hist(data$Mass_Media,
main="Women Exposure to Mass Media",
xlab="Mass Media",
col="red",
freq=FALSE
)

hist(data$Immunization,
main="Children Immunization",
xlab="Immunization",
col="red",
freq=FALSE
)

hist(data$Stunting,
main="Child Hood Stunting",
xlab="Stunting",
col="red",
freq=FALSE
)

hist(data$Marriage_18,
main="Marraige Before the Age of 18",
xlab="Marriage before  age 18",
col="red",
freq=FALSE
)

hist(data$Wmn_Literacy,
main="Women Literacy Rate",
xlab="Women Literacy",
col="red",
freq=FALSE
)

We can see above that majority of the distributions are not normal except child immunization rate.As we are trying to analyze the determinants of child hood stunting lets now see the correlations of numeric features.

M <-cor(data[,c(4:8)])
corrplot(M, type="upper", order="hclust",
         col=brewer.pal(n=8, name="RdYlBu"))

We can see in the above correlation plot that women literacy rate, mass media exposure and immunization are negatively associated with stunting. However, marriage before age 18 positively correlates with child hood stunting. Now lets try to examine if we observe any variation in child hood stunting with respect to climate vulnerability.

boxplot(Stunting~Climate_Vun,data=data, notch=TRUE, 
        col=(c("gold","darkgreen")),
        main="Child Hood Stunting with respect to Climate Vulnerability",
   xlab="Climate Vulnerability", ylab="Child Hood Stunting")
## Warning in (function (z, notch = FALSE, width = NULL, varwidth = FALSE, : some
## notches went outside hinges ('box'): maybe set notch=FALSE

From the above graph we can conclude that climate vulnerability does play a significant role as we observe higher level stunting rate for high climate vulnerability level. For low climate vulnerability we observe lower rate of stunting. Now lets observe child hood stunting across districts and region separately.

ggplot(data, 
       aes(x=Stunting, 
           y=reorder(Districts, Stunting))) +
  geom_point(color="blue", 
             size = 2) +
  geom_segment(aes(x = 60, 
               xend = Stunting, 
               y = reorder(Districts, Stunting), 
               yend = reorder(Districts, Stunting)),
               color = "lightgrey") +
  labs (x = "Child Hood Stunting",
        y = " Districts",
        title = "District Wise Child Hood Stunting",
        subtitle = "Province Punjab 2017-2018 ") +
  theme_minimal() + 
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

In the above plot districts have been arranged in descending order with rajan pur being the highest in terms of child hood stunting and gujrat being the lowest. Now lets see the variation across regions.

ggplot(data, 
       aes(x =Region,
           y = Stunting, 
           color = Region)) +
  geom_quasirandom(alpha = 0.7,
                   size = 1.5) + 
  scale_y_continuous(name="Child Hood Stunting",
                         breaks = seq(0, 80, by = 10),
                         labels = paste0(format(seq(0, 80, by = 10), decimal.mark = ","),"%"))+
  labs(title = "Child Hood Stunting by Region", 
       subtitle = "For the Year 2017-2018",
       x = "Region",
       y = "") +
  theme_minimal() +
  theme(legend.position = "none")

From the above plot we can see that south region tends to bear higher rate of childhood stunting in comparison to central and north region. Lets try to conclude above all plots before going deep into further analysis;

Distributions are not normal Both positive and negative correlation exist Child hood stunting varies with respect to change in climate vulnerability with high vulnerability corresponds to higher rate of child hood stunting. South region possess the higher stunting rate.

Now lets try to gain more insights as why child hood stunting varies across districts and regions. What are the potential reasons? For this we will visualize each feature with our target variable (Stunting).

ggplot(data = data, aes(x=Wmn_Literacy, y=Stunting))+labs(x = "Women Literacy",
       y = "Child Hood Stunting ",
       title = "Child Hood Stunting vs Women Literacy")+geom_point()+
scale_x_continuous(name = "Women Literacy", 
                        breaks = seq(0,100,by=10),
                        labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%")) +
  scale_y_continuous(name="Child Hood Stunting",
                         breaks = seq(0, 100, by = 10),
                         labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%"))+
                         theme_bw()

lets play with it. Lets add climate vulnerability as with levels low, medium and high and see if we observe more meaningful insights.

z <- ggplot(data = data, aes(x = Wmn_Literacy, y =Stunting )) +
  labs(title = "Child Hood Stunting vs Women Literacy")+ geom_point(data = data, 
               aes(color = Climate_Vun), 
               size = 2
)+scale_x_continuous(name = "Women Literacy", 
                        breaks = seq(0,100,by=10),
                        labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%")) +
  scale_y_continuous(name="Child Hood Stunting",
                         breaks = seq(0, 100, by = 10),
                         labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%"))+
                         guides(color = guide_legend(title = "Climate_Vulnerability"))
z

We can see in the above scatter plot that climate vulnerability does play an important role. As high level climate vulnerability tends to relate with higher stunting rate and lower women literacy. Therefore, we can conclude that climate vulnerability affects both the indicators.Instead of analyzing separately each relationship lets visualize mutliple graphs at once and draw some conclusions.

p1 <- ggplot(data) + 
      geom_point(aes(x = Mass_Media, y = Stunting, color = Climate_Vun)) +
      labs(x = 'Mass Media',
           y = 'Child hood Stunting')
p2 <- ggplot(data) + 
      geom_point(aes(x = Immunization, y = Stunting, color = Climate_Vun)) +
      labs(x = 'Immunization',
           y = 'Child hood Stunting')
p3 <- ggplot(data) + 
      geom_point(aes(x = Marriage_18, y =Stunting, color = Climate_Vun))  +
      labs(x = 'Marraige before 18',
           y = ' Child hood Stuting ')
p4 <- ggplot(data) + 
      geom_point(aes(x = Wmn_Literacy, y = Stunting, color = Climate_Vun))  +
      labs(x = 'Women Literacy',
           y = 'Child hood Stunting')

p4 <- p4 + theme(legend.position = "bottom") # position the legend in the desired way
legend <- get_legend(p4) # extract legend from the object
legend
## TableGrob (5 x 5) "guide-box": 2 grobs
##                                     z     cells                  name
## 99_50372d80479296ec7eecf5ca2077ea0a 1 (3-3,3-3)                guides
##                                     0 (2-4,2-4) legend.box.background
##                                               grob
## 99_50372d80479296ec7eecf5ca2077ea0a gtable[layout]
##                                     zeroGrob[NULL]
grid.arrange(arrangeGrob(p1 + theme(legend.position = "none"), # first plot (no legend)
                         p2 + theme(legend.position = "none"), # second plot (no legend)
                         p3 + theme(legend.position = "none"), # third plot (no legend)
                         p4 + theme(legend.position = "none"), # fourth plot (no legend)
                         ncol = 2), # number of columns
             legend, 
             nrow = 2, # number of rows for the grid.arrange function
             top = " Analysis of Child Hood Stunting across Multidimensions ",  # title
             heights = c(10, 1) # relative height of two rows
) 

We got mixed results. We observe two quite clear relationships from the above plots. First we can see that pre mature marriage or marriage before age 15 relates positively with child hood stunting and women literacy relates negatively with stunting.Furthermore, climate vulnerability also tends to negatively affect women literacy and mass media exposure.

Lets visualize region wise variations.

z1 <- ggplot(data) + 
      geom_point(aes(x = Mass_Media, y = Stunting, color = Region)) +
      labs(x = 'Mass Media',
           y = 'Child hood Stunting')
z2 <- ggplot(data) + 
      geom_point(aes(x = Immunization, y = Stunting, color = Region)) +
      labs(x = 'Immunization',
           y = 'Child hood Stunting')
z3 <- ggplot(data) + 
      geom_point(aes(x = Marriage_18, y =Stunting, color = Region))  +
      labs(x = 'Marriage before 18',
           y = ' Child hood Stuting ')
z4 <- ggplot(data) + 
      geom_point(aes(x = Wmn_Literacy, y = Stunting, color = Region))  +
      labs(x = 'Women Literacy',
           y = 'Child hood Stunting')

z4 <- z4 + theme(legend.position = "bottom") # position the legend in the desired way
legend <- get_legend(z4) # extract legend from the object
legend
## TableGrob (5 x 5) "guide-box": 2 grobs
##                                     z     cells                  name
## 99_22fb2f520d9b795a02dde5c6ba3df7f9 1 (3-3,3-3)                guides
##                                     0 (2-4,2-4) legend.box.background
##                                               grob
## 99_22fb2f520d9b795a02dde5c6ba3df7f9 gtable[layout]
##                                     zeroGrob[NULL]
grid.arrange(arrangeGrob(z1 + theme(legend.position = "none"), # first plot (no legend)
                         z2 + theme(legend.position = "none"), # second plot (no legend)
                         z3 + theme(legend.position = "none"), # third plot (no legend)
                         z4 + theme(legend.position = "none"), # fourth plot (no legend)
                         ncol = 2), # number of columns
             legend, 
             nrow = 2, # number of rows for the grid.arrange function
             top = " Region Wise Child Hood Stunting ",  # title
             heights = c(10, 1) # relative height of two rows
) 

We can conclude south region tends to be the worst performing among all indicators in comparison to north and central region. It has not only higher stunting rate and higher rates of marriage before 18 but lower women literacy, mass media exposure and immunization rate.Now lets see try and visualize district wise estimates bases on climate vulnerability.

w <- ggplot(data = data, aes(x = Mass_Media, y =Stunting )) +labs(x = "Women Exposure to Mass Media",
       y = "Child Hood Stunting ",
       title = "District Wise Child Hood Stunting vs Mass Media Exposure")+ geom_point(data = data, 
               aes(color = Climate_Vun), 
               size = 2
)+scale_x_continuous(name = "Women Exposure to Mass Media", 
                        breaks = seq(0,100,by=10),
                        labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%")) +
  scale_y_continuous(name="Child Hood Stunting",
                         breaks = seq(0, 100, by = 10),
                         labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%"))+
                         guides(color = guide_legend(title = "CLimate_Vulnerability")) +
                        geom_label_repel(aes(label = Districts), size = 2) + theme_minimal()
w

w1<- ggplot(data = data, aes(x = Immunization, y =Stunting )) +labs(x = "Child Immunization",
       y = "Child Hood Stunting ",
       title = "District Wise Child Hood Stunting vs Child Immunization")+ geom_point(data = data, 
               aes(color = Climate_Vun), 
               size = 2
)+scale_x_continuous(name = "Child Immunization", 
                        breaks = seq(0,100,by=10),
                        labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%")) +
  scale_y_continuous(name="Child Hood Stunting",
                         breaks = seq(0, 100, by = 10),
                         labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%"))+
                         guides(color = guide_legend(title = "Climate_Vulnerability")) +
                        geom_label_repel(aes(label = Districts), size = 2) + theme_minimal()
w1

w2 <- ggplot(data = data, aes(x = Marriage_18, y =Stunting )) +labs(x = "Marriage before 18",
       y = "Child Hood Stunting ",
       title = "District Wise Child Hood Stunting vs Marriage befor Age 18")+ geom_point(data = data, 
               aes(color = Climate_Vun), 
               size = 2
)+scale_x_continuous(name = "Marriage before 18", 
                        breaks = seq(0,50,by=5),
                        labels = paste0(format(seq(0, 50, by = 5), decimal.mark = ","),"%")) +
  scale_y_continuous(name="Child Hood Stunting",
                         breaks = seq(0, 100, by = 10),
                         labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%"))+
                         guides(color = guide_legend(title = "Climate_Vulnerability")) +
                        geom_label_repel(aes(label = Districts), size = 2) + theme_minimal()
w2

w3 <- ggplot(data = data, aes(x = Wmn_Literacy, y =Stunting )) +labs(x = "Women Literacy",
       y = "Child Hood Stunting ",
       title = "District Wise Child Hood Stunting vs Women Literacy")+ geom_point(data = data, 
               aes(color = Climate_Vun), 
               size = 2
)+scale_x_continuous(name = "Women Literacy", 
                        breaks = seq(0,100,by=10),
                        labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%")) +
  scale_y_continuous(name="Child Hood Stunting",
                         breaks = seq(0, 100, by = 10),
                         labels = paste0(format(seq(0, 100, by = 10), decimal.mark = ","),"%"))+
                         guides(color = guide_legend(title = "Climate_Vulnerability")) +
                        geom_label_repel(aes(label = Districts), size = 2) + theme_minimal()
w3
## Warning: ggrepel: 1 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

We can see the district wise estimates of child hood stunting with respect to other features. Additionally, to get them more clearer understanding we can plot the maps and add those features. Lets try that:

# shape file 
dis.sf<-st_read("PAK_adm3.shp")
## Reading layer `PAK_adm3' from data source 
##   `E:\2nd year courses\Visualization R\advance_R_Project\PAK_adm3.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 141 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 60.89944 ymin: 23.70292 xmax: 77.84308 ymax: 37.09701
## Geodetic CRS:  WGS 84
ds.df <- as.data.frame(dis.sf)
#filtering out the districts of province punjab for our purpose
punjb <- dis.sf %>% filter(NAME_1 == "Punjab")
head(punjb, 3)
## Simple feature collection with 3 features and 13 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 69.73097 ymin: 27.6984 xmax: 73.96632 ymax: 30.35056
## Geodetic CRS:  WGS 84
##   ID_0 ISO   NAME_0 ID_1 NAME_1 ID_2     NAME_2 ID_3        NAME_3   TYPE_3
## 1  171 PAK Pakistan    7 Punjab   19 Bahawalpur   77  Bahawalnagar District
## 2  171 PAK Pakistan    7 Punjab   19 Bahawalpur   78    Bahawalpur District
## 3  171 PAK Pakistan    7 Punjab   19 Bahawalpur   79 Rahimyar Khan District
##   ENGTYPE_3 NL_NAME_3      VARNAME_3                       geometry
## 1  District      <NA>           <NA> MULTIPOLYGON (((72.7897 28....
## 2  District      <NA>      Bhawalpur MULTIPOLYGON (((71.8963 27....
## 3  District      <NA> Rahim Yar Khan MULTIPOLYGON (((71.12506 27...
#Taking the desired data to see spatial distribution of features across districts
my_data <- data[, c("ID_3", "Mass_Media", "Immunization", "Stunting","Marriage_18", "Wmn_Literacy", "Climate_Vun")]
punjb <- merge(punjb, data, by= "ID_3")

Now lets see the spatial distrbutions of our features across districts.

d1 <- ggplot() + 
      geom_sf(punjb, mapping=aes(geometry=geometry, fill=Stunting)) +
  scale_fill_gradient(low = "#fff7ec",  high = "#7F0000") +
      labs(x = 'Stunting')
d2 <- ggplot() + 
      geom_sf(punjb, mapping=aes(geometry=geometry, fill=Immunization)) +
  scale_fill_gradient(low = "#fff7ec",  high = "#7F0000") +
      labs(x = 'Immunization')
d3 <- ggplot() + 
      geom_sf(punjb, mapping=aes(geometry=geometry, fill=Mass_Media)) +
  scale_fill_gradient(low = "#fff7ec",  high = "#7F0000") +
      labs(x = 'Mass_Media')
d4 <- ggplot() + 
      geom_sf(punjb, mapping=aes(geometry=geometry, fill=Marriage_18)) +
  scale_fill_gradient(low = "#fff7ec",  high = "#7F0000") +
      labs(x = 'Marriage before 18')

d5 <- ggplot() + 
      geom_sf(punjb, mapping=aes(geometry=geometry, fill=Wmn_Literacy)) +
  scale_fill_gradient(low = "#fff7ec",  high = "#7F0000") +
      labs(x = 'Women Literacy')


design1 <- matrix(c(1, 2, 3, 4), # order of plots
                 2, # number of rows
                 2, # number of columns
                 byrow = TRUE # Whether the matrix is col (FALSE) or row (TRUE) major
)

multiplot(d1, d2, d3, d4, layout = design1)

These above maps are quite meaningful and astonishing as one could see we have districts where stunting rates are quite high but immunization coverage and mass media exposure quite low. Furthermore, they allow us to not only visualize spatial distributions but also to identify specific location based policy interventions. In our case we can see districts of south regions require more attention from government.