607 Final Project : Global Nutrition & Urbanization Correlation And Other Effects On Under 5 Development

Rationale

Nutrition is essential and World Health Organization has a baseline of the daily essential factors that human body needs to lead a healthy life.

As the economic, social and environmental conditions across the globe are not same, we observe disparity in the nutrition that the inhabitants of different countries receive.

While we know that poverty inflicted countries would suffer malnutrition, an attempt here is to understand how urbanization has impacted and how even those economies that are developing and well above poverty line are facing ‘Malnutrition’, which is under nutrition and over nutrition.

** Through this analysis we hope to find what affects the nutrition which is the so primary in healthy development of under 5 children**

Data

Environment Set Up

# Library for string manipulation/regex operations
library(stringr)
# Library for data display in tabular format
library(DT)
# Library to read text file
library(RCurl)
# Library to gather (to long format) and spread (to wide format) data, to tidy data
library(tidyr)
# Library to filter, transform data
library(dplyr)


# Library to plot
library(ggplot2)
library(knitr)


# Library for loading data from World Bank

library(WDI)

library(ggmap)

library(grid)

# options(gvis.print.tag='html')

Loading DataSets

The Nutrition Data has been sourced from Harvard Data Website.

Hardvard Nutrition Dataset

And Other complementary data has been sourced from World Bank Resources.

World Bank Data

Numerous Indicators are available , so some that have provoked thoughts in align with the nutrition data have been picked for analysis.

# Accessing World Bank Data through API The World Bank Data can be accessed through their API which is provided through R Package WDI.  A basic search
# by the following function WDIsearch('female.*emp*')[1:50,] gives the related indicators , one can choose from to load the World Bank data into R and
# do further analysis



# Urban % of Population

urbanPopdat <- WDI(indicator = "SP.URB.TOTL.IN.ZS", start = 2010, end = 2014)
# urbanPopdat <- na.omit(urbanPopdat)
urbanPopdat <- subset(urbanPopdat[, 2:4])
colnames(urbanPopdat) <- c("country", "urbanrate", "year")
# View(urbanPopdat)



# Indicator Of Women : Who Have Educational attainment, at least competed lower secondary, population 25+, total (%)


eduLowSecFemale <- WDI(indicator = "SE.SEC.CUAT.LO.FE.ZS", start = 2010, end = 2014)
eduLowSecFemale <- subset(eduLowSecFemale[, 2:4])
colnames(eduLowSecFemale) <- c("country", "feduval", "year")
# View(eduLowSecFemale)

# Indicator Of Women Working % Employment in services, female

empFemale <- WDI(indicator = "SL.SRV.EMPL.FE.ZS", start = 2010, end = 2014)
empFemale <- subset(empFemale[, 2:4])
colnames(empFemale) <- c("country", "fempval", "year")
# View(empFemale)

# Improved Water resources % population with access: SH.H2O.SAFE.ZS

watersrc <- WDI(indicator = "SH.H2O.SAFE.ZS", start = 2010, end = 2014)
watersrc <- subset(watersrc[, 2:4])
colnames(watersrc) <- c("country", "waterval", "year")
# View(watersrc)


# Accessing Nutrition Data 2015GNRDataset_Child-Adult-Nutrition

data.giturl <- "https://raw.githubusercontent.com/DataDriven-MSDA/DATA607/master/final/GNR.csv"
gnr.gitdata <- getURL(data.giturl)
gnr.csv <- read.csv2(text = gnr.gitdata, header = T, sep = ",", stringsAsFactors = FALSE)
# View(gnr.csv)

# Number Of Countries
nrow(gnr.csv)

## [1] 193

################################################################ 

# Subsetting variables from Nutrition DataSet

############################################################### 


gnrsub <- subset(gnr.csv, select = c(country, continent, regionUN, subregionUN, year_stunting_current, prev_stunting_current, number_stunting_current, 
    year_wasting, prev_wasting, number_wasting, year_sev_wasting, number_sev_wasting, prev_sev_wasting, year_u5overweight, prev_u5overweight, number_u5overweight, 
    year_lbw, LBW, year_stuntingtrend1, rate_stuntingtrend1, year_stuntingtrend2, rate_stuntingtrend2, year_stuntingtrend3, rate_stuntingtrend3, year_stuntingtrend4, 
    rate_stuntingtrend4, year_stuntingtrend5, rate_stuntingtrend5, year_heightBMI, prev_height145, year_WRAanaemia, WRAanaemia_RATE, WRAanaemia_NUMBER, 
    year_vitA_def, prevalence_vita, year_IodineNutrition, Class_IodineNutrition))


# View(gnrsub)

Data Cleansing And Transformation

Data Cleaning

Many Country name swere found a bit mismatching among the World Bank Data and The Nutrition Dataset. These were identified and made consistent among all the indicator data and nutrition data.

################################################## 

### Cleaning Data To Match Country Name Mismatch

################################################## 

# Cleaning Country names for Urban Population Data
xurban <- nrow(urbanPopdat)

for (i in 1:xurban) {
    if (urbanPopdat$country[i] == "Bahamas, The") {
        urbanPopdat$country[i] = "Bahamas"
    }
    if (urbanPopdat$country[i] == "Cabo Verde") {
        urbanPopdat$country[i] = "Cape Verde"
    }
    if (urbanPopdat$country[i] == "Congo, Dem. Rep.") {
        urbanPopdat$country[i] = "Democratic Republic of the Congo"
    }
    if (urbanPopdat$country[i] == "Congo, Rep.") {
        urbanPopdat$country[i] = "Congo Republic"
    }
    if (urbanPopdat$country[i] == "Egypt, Arab Rep.") {
        urbanPopdat$country[i] = "Egypt"
    }
    if (urbanPopdat$country[i] == "Gambia, The") {
        urbanPopdat$country[i] = "Gambia"
    }
    if (urbanPopdat$country[i] == "Iran, Islamic Rep.") {
        urbanPopdat$country[i] = "Iran"
    }
    if (urbanPopdat$country[i] == "Kyrgyz Republic") {
        urbanPopdat$country[i] = "Kyrgyzstan"
    }
    if (urbanPopdat$country[i] == "Korea, Dem. Peopleâ???Ts Rep.") {
        urbanPopdat$country[i] = "Democratic People's Republic of Korea"
    }
    if (urbanPopdat$country[i] == "Korea, Rep.") {
        urbanPopdat$country[i] = "Republic of Korea"
    }
    if (urbanPopdat$country[i] == "Micronesia, Fed. Sts.") {
        urbanPopdat$country[i] = "Micronesia (Federated States of)"
    }
    if (urbanPopdat$country[i] == "Moldova") {
        urbanPopdat$country[i] = "Republic of Moldova"
    }
    if (urbanPopdat$country[i] == "Slovak Republic") {
        urbanPopdat$country[i] = "Slovakia"
    }
    if (urbanPopdat$country[i] == "St. Kitts and Nevis") {
        urbanPopdat$country[i] = "Saint Kitts and Nevis"
    }
    if (urbanPopdat$country[i] == "St. Lucia") {
        urbanPopdat$country[i] = "Saint Lucia"
    }
    if (urbanPopdat$country[i] == "St. Vincent and the Grenadines") {
        urbanPopdat$country[i] = "Saint Vincent and the Grenadines"
    }
    if (urbanPopdat$country[i] == "Macedonia, FYR") {
        urbanPopdat$country[i] = "The former Yugoslav Republic of Macedonia"
    }
    if (urbanPopdat$country[i] == "Tanzania") {
        urbanPopdat$country[i] = "United Republic of Tanzania"
    }
    if (urbanPopdat$country[i] == "Venezuela, RB") {
        urbanPopdat$country[i] = "Venezuela"
    }
    if (urbanPopdat$country[i] == "Yemen, Rep.") {
        urbanPopdat$country[i] = "Yemen"
    }
    if (urbanPopdat$country[i] == "Yemen, Rep.") {
        urbanPopdat$country[i] = "Yemen"
    }
    if (urbanPopdat$country[i] == "United States") {
        urbanPopdat$country[i] = "United States of America"
    }
    if (urbanPopdat$country[i] == "Syrian Arab Republic") {
        urbanPopdat$country[i] = "Syria"
    }
    
    
    
}


# Cleaning Country Names :For Female Education Data

xedu <- nrow(eduLowSecFemale)
for (i in 1:xedu) {
    if (eduLowSecFemale$country[i] == "Bahamas, The") {
        eduLowSecFemale$country[i] = "Bahamas"
    }
    if (eduLowSecFemale$country[i] == "Cabo Verde") {
        eduLowSecFemale$country[i] = "Cape Verde"
    }
    if (eduLowSecFemale$country[i] == "Congo, Dem. Rep.") {
        eduLowSecFemale$country[i] = "Democratic Republic of the Congo"
    }
    if (eduLowSecFemale$country[i] == "Congo, Rep.") {
        eduLowSecFemale$country[i] = "Congo Republic"
    }
    if (eduLowSecFemale$country[i] == "Egypt, Arab Rep.") {
        eduLowSecFemale$country[i] = "Egypt"
    }
    if (eduLowSecFemale$country[i] == "Gambia, The") {
        eduLowSecFemale$country[i] = "Gambia"
    }
    if (eduLowSecFemale$country[i] == "Iran, Islamic Rep.") {
        eduLowSecFemale$country[i] = "Iran"
    }
    if (eduLowSecFemale$country[i] == "Kyrgyz Republic") {
        eduLowSecFemale$country[i] = "Kyrgyzstan"
    }
    if (eduLowSecFemale$country[i] == "Korea, Dem. Peopleâ???Ts Rep.") {
        eduLowSecFemale$country[i] = "Democratic People's Republic of Korea"
    }
    if (eduLowSecFemale$country[i] == "Korea, Rep.") {
        eduLowSecFemale$country[i] = "Republic of Korea"
    }
    if (eduLowSecFemale$country[i] == "Micronesia, Fed. Sts.") {
        eduLowSecFemale$country[i] = "Micronesia (Federated States of)"
    }
    if (eduLowSecFemale$country[i] == "Moldova") {
        eduLowSecFemale$country[i] = "Republic of Moldova"
    }
    if (eduLowSecFemale$country[i] == "Slovak Republic") {
        eduLowSecFemale$country[i] = "Slovakia"
    }
    if (eduLowSecFemale$country[i] == "St. Kitts and Nevis") {
        eduLowSecFemale$country[i] = "Saint Kitts and Nevis"
    }
    if (eduLowSecFemale$country[i] == "St. Lucia") {
        eduLowSecFemale$country[i] = "Saint Lucia"
    }
    if (eduLowSecFemale$country[i] == "St. Vincent and the Grenadines") {
        eduLowSecFemale$country[i] = "Saint Vincent and the Grenadines"
    }
    if (eduLowSecFemale$country[i] == "Macedonia, FYR") {
        eduLowSecFemale$country[i] = "The former Yugoslav Republic of Macedonia"
    }
    if (eduLowSecFemale$country[i] == "Tanzania") {
        eduLowSecFemale$country[i] = "United Republic of Tanzania"
    }
    if (eduLowSecFemale$country[i] == "Venezuela, RB") {
        eduLowSecFemale$country[i] = "Venezuela"
    }
    if (eduLowSecFemale$country[i] == "Yemen, Rep.") {
        eduLowSecFemale$country[i] = "Yemen"
    }
    if (eduLowSecFemale$country[i] == "Yemen, Rep.") {
        eduLowSecFemale$country[i] = "Yemen"
    }
    if (eduLowSecFemale$country[i] == "United States") {
        eduLowSecFemale$country[i] = "United States of America"
    }
    if (eduLowSecFemale$country[i] == "Syrian Arab Republic") {
        eduLowSecFemale$country[i] = "Syria"
    }
    
    
    
}

# Cleaning Country Names For Female Employment

xemp <- nrow(empFemale)

for (i in 1:xemp) {
    if (empFemale$country[i] == "Bahamas, The") {
        empFemale$country[i] = "Bahamas"
    }
    if (empFemale$country[i] == "Cabo Verde") {
        empFemale$country[i] = "Cape Verde"
    }
    if (empFemale$country[i] == "Congo, Dem. Rep.") {
        empFemale$country[i] = "Democratic Republic of the Congo"
    }
    if (empFemale$country[i] == "Congo, Rep.") {
        empFemale$country[i] = "Congo Republic"
    }
    if (empFemale$country[i] == "Egypt, Arab Rep.") {
        empFemale$country[i] = "Egypt"
    }
    if (empFemale$country[i] == "Gambia, The") {
        empFemale$country[i] = "Gambia"
    }
    if (empFemale$country[i] == "Iran, Islamic Rep.") {
        empFemale$country[i] = "Iran"
    }
    if (empFemale$country[i] == "Kyrgyz Republic") {
        empFemale$country[i] = "Kyrgyzstan"
    }
    if (empFemale$country[i] == "Korea, Dem. Peopleâ???Ts Rep.") {
        empFemale$country[i] = "Democratic People's Republic of Korea"
    }
    if (empFemale$country[i] == "Korea, Rep.") {
        empFemale$country[i] = "Republic of Korea"
    }
    if (empFemale$country[i] == "Micronesia, Fed. Sts.") {
        empFemale$country[i] = "Micronesia (Federated States of)"
    }
    if (empFemale$country[i] == "Moldova") {
        empFemale$country[i] = "Republic of Moldova"
    }
    if (empFemale$country[i] == "Slovak Republic") {
        empFemale$country[i] = "Slovakia"
    }
    if (empFemale$country[i] == "St. Kitts and Nevis") {
        empFemale$country[i] = "Saint Kitts and Nevis"
    }
    if (empFemale$country[i] == "St. Lucia") {
        empFemale$country[i] = "Saint Lucia"
    }
    if (empFemale$country[i] == "St. Vincent and the Grenadines") {
        empFemale$country[i] = "Saint Vincent and the Grenadines"
    }
    if (empFemale$country[i] == "Macedonia, FYR") {
        empFemale$country[i] = "The former Yugoslav Republic of Macedonia"
    }
    if (empFemale$country[i] == "Tanzania") {
        empFemale$country[i] = "United Republic of Tanzania"
    }
    if (empFemale$country[i] == "Venezuela, RB") {
        empFemale$country[i] = "Venezuela"
    }
    if (empFemale$country[i] == "Yemen, Rep.") {
        empFemale$country[i] = "Yemen"
    }
    if (empFemale$country[i] == "Yemen, Rep.") {
        empFemale$country[i] = "Yemen"
    }
    if (empFemale$country[i] == "United States") {
        empFemale$country[i] = "United States of America"
    }
    if (empFemale$country[i] == "Syrian Arab Republic") {
        empFemale$country[i] = "Syria"
    }
}


# Cleaning Country Names for Water Source Data


xwater <- nrow(watersrc)

for (i in 1:xwater) {
    if (watersrc$country[i] == "Bahamas, The") {
        watersrc$country[i] = "Bahamas"
    }
    if (watersrc$country[i] == "Cabo Verde") {
        watersrc$country[i] = "Cape Verde"
    }
    if (watersrc$country[i] == "Congo, Dem. Rep.") {
        watersrc$country[i] = "Democratic Republic of the Congo"
    }
    if (watersrc$country[i] == "Congo, Rep.") {
        watersrc$country[i] = "Congo Republic"
    }
    if (watersrc$country[i] == "Egypt, Arab Rep.") {
        watersrc$country[i] = "Egypt"
    }
    if (watersrc$country[i] == "Gambia, The") {
        watersrc$country[i] = "Gambia"
    }
    if (watersrc$country[i] == "Iran, Islamic Rep.") {
        watersrc$country[i] = "Iran"
    }
    if (watersrc$country[i] == "Kyrgyz Republic") {
        watersrc$country[i] = "Kyrgyzstan"
    }
    if (watersrc$country[i] == "Korea, Dem. Peopleâ???Ts Rep.") {
        watersrc$country[i] = "Democratic People's Republic of Korea"
    }
    if (watersrc$country[i] == "Korea, Rep.") {
        watersrc$country[i] = "Republic of Korea"
    }
    if (watersrc$country[i] == "Micronesia, Fed. Sts.") {
        watersrc$country[i] = "Micronesia (Federated States of)"
    }
    if (watersrc$country[i] == "Moldova") {
        watersrc$country[i] = "Republic of Moldova"
    }
    if (watersrc$country[i] == "Slovak Republic") {
        watersrc$country[i] = "Slovakia"
    }
    if (watersrc$country[i] == "St. Kitts and Nevis") {
        watersrc$country[i] = "Saint Kitts and Nevis"
    }
    if (watersrc$country[i] == "St. Lucia") {
        watersrc$country[i] = "Saint Lucia"
    }
    if (watersrc$country[i] == "St. Vincent and the Grenadines") {
        watersrc$country[i] = "Saint Vincent and the Grenadines"
    }
    if (watersrc$country[i] == "Macedonia, FYR") {
        watersrc$country[i] = "The former Yugoslav Republic of Macedonia"
    }
    if (watersrc$country[i] == "Tanzania") {
        watersrc$country[i] = "United Republic of Tanzania"
    }
    if (watersrc$country[i] == "Venezuela, RB") {
        watersrc$country[i] = "Venezuela"
    }
    if (watersrc$country[i] == "Yemen, Rep.") {
        watersrc$country[i] = "Yemen"
    }
    if (watersrc$country[i] == "Yemen, Rep.") {
        watersrc$country[i] = "Yemen"
    }
    if (watersrc$country[i] == "United States") {
        watersrc$country[i] = "United States of America"
    }
    if (watersrc$country[i] == "Syrian Arab Republic") {
        watersrc$country[i] = "Syria"
    }
}


# Cleaning Country Names For Nutrition Data

xgnrsub <- nrow(gnrsub)

for (i in 1:xgnrsub) {
    if (gnrsub$country[i] == "Lao People's Democratic Republic") {
        gnrsub$country[i] = "Lao PDR"
    }
    if (gnrsub$country[i] == "Congo (Republic of the)") {
        gnrsub$country[i] = "Congo Republic"
    }
    if (gnrsub$country[i] == "Viet Nam") {
        gnrsub$country[i] = "Vietnam"
    }
    
}


############################################################################### 

# The World Bank Data for various indicators is filtered for countries based on the countries available in the primary Nutrition dataset

############################################################################## 

urbanPopdatmatch <- urbanPopdat[urbanPopdat$country %in% gnrsub$country, ]

eduLowSecFemalematch <- eduLowSecFemale[which(eduLowSecFemale$country %in% gnrsub$country), ]

empFemalematch <- empFemale[which(empFemale$country %in% gnrsub$country), ]

watersrcmatch <- watersrc[which(watersrc$country %in% gnrsub$country), ]

# View(urbanPopdatmatch) View(empFemalematch) View(eduLowSecFemalematch) View(watersrcmatch)

Data Transformation For the World Bank Derived Indicator Data

######################################################################## Wide to long format for urban data to calculate avaerage urban population over years This is being done for 1. Since not all countries have recorded
######################################################################## values for each year , 2. to have an average of population over couple of years made sense.

urbanPopdatmatchavg <- urbanPopdatmatch %>% select(country, urbanrate) %>% group_by(country) %>% dplyr::summarise(avgurbval = mean(urbanrate, na.rm = TRUE))

urbanPopdatmatchwide <- urbanPopdatmatch %>% spread(year, urbanrate)
colnames(urbanPopdatmatchwide) <- c("country", "u2010", "u2011", "u2012", "u2013", "u2014")

# View(urbanPopdatmatchavg) View(urbanPopdatmatchwide)


urbanPopdatmatchnew <- merge(urbanPopdatmatchwide, urbanPopdatmatchavg, by = "country")
# View(urbanPopdatmatchnew)


######################################################################## 

# Data Trnasformation for deriving country based female education average

######################################################################## 


# eduLowSecFemalematch <- na.omit(eduLowSecFemalematch)
eduLowSecFemalematchnew <- eduLowSecFemalematch %>% select(country, feduval) %>% group_by(country) %>% dplyr::summarise(avgfeduval = mean(feduval, na.rm = TRUE))


# View(eduLowSecFemalematchnew)

######################################################################## 

# Data Trnasformation for deriving country based female employment average

########################################################################### 

# empFemalematch <- na.omit(empFemalematch)
empFemalematchnew <- empFemalematch %>% select(country, fempval) %>% group_by(country) %>% dplyr::summarise(avgfempval = mean(fempval, na.rm = TRUE))

# View(empFemalematchnew)

######################################################################### Data Trnasformation for deriving country based water source % average

######################################################################## 



# empFemalematch <- na.omit(empFemalematch)
watersrcmatchnew <- watersrcmatch %>% select(country, waterval) %>% group_by(country) %>% dplyr::summarise(avgwaterval = mean(waterval, na.rm = TRUE))

# View(empFemalematchnew)


######################################################## Merging World Bank data with Nutrition dataset

####################################################### 

gnrnew <- merge(gnrsub, urbanPopdatmatchnew, by = "country")
# View(gnrnew)

gnrnew <- merge(gnrnew, eduLowSecFemalematchnew, by = "country")
# View(gnrnew)


gnrnew <- merge(gnrnew, empFemalematchnew, by = "country")
# View(gnrnew)

gnrnew <- merge(gnrnew, watersrcmatchnew, by = "country")
# View(gnrnew)

Adding UrbanFactor as categorical column based on average urban population

Y <- nrow(gnrnew)

for (i in 1:Y) {
    if (gnrnew$avgurbval[i] >= 70) {
        gnrnew$UrbanFactor[i] <- "U"
    } else if (gnrnew$avgurbval[i] > 40 & gnrnew$avgurbval[i] < 70) {
        gnrnew$UrbanFactor[i] <- "D"
        
    } else if (gnrnew$avgurbval[i] <= 40) {
        gnrnew$UrbanFactor[i] <- "R"
    }
}

Plots And Maps

Stunting, or short height for age, and wasting, or low weight for length/height, are important public health indicators. A third indicator, underweight, or low weight for age, combines information about linear growth retardation and weight for length/height.

Map: World Urban Population

Here we see the Urbanization level of the world countries

################################## Observing Urbanization with Urban % Population
gc_urbanization <- gvisGeoChart(gnrnew, "country", "avgurbval", hovervar = "country", options = list(projection = "kavrayskiy-vii", colorAxis = "{colors:['yellow', 'green']}"))
plot(gc_urbanization)

Map: Urbanization Vs Under-5 Overweight

Plot Map for Urban Population (Color coded Avg % Of Urban Population) With Under-5 Over WEight % Population

Color Variation: Urban %
Hover Text : Over Weight % Population (prev_u5overweight)

# Setting option for printing googleVis charts locally
op <- options(gvis.plot.tag = "chart")



gnrnew$prev_u5overweight <- sub("^$", "NA", gnrnew$prev_u5overweight)

gc_urban_under5overweight <- gvisGeoChart(gnrnew, "country", "avgurbval", hovervar = "prev_u5overweight", options = list(projection = "kavrayskiy-vii", 
    colorAxis = "{colors:['yellow', 'green']}"))
plot(gc_urban_under5overweight)

cat(gc_urban_under5overweight$html$chart, file = "gc_urban_under5overweight.html")

###################################################################################

Map: Urbanization Vs Under-5 Wasting

Plot Map for Urban Population (Color coded Avg % Of Urban Population) With Under-5 Wasting % Population

Color Variation: Current Wasting %
Hover Text : Wasting % And Average Urban Population (avgurbval)

We observe the obvious unfortunate under-5 wasting in rural countries is more.

gnrnew$prev_wasting <- sub("^$", "NA", gnrnew$prev_wasting)


gc_urban_under5wasting <- gvisGeoChart(gnrnew, "country", "prev_wasting", hovervar = "avgurbval", options = list(projection = "kavrayskiy-vii", height = 700, 
    width = 500, colorAxis = "{colors:['yellow', 'red']}"))
plot(gc_urban_under5wasting)

cat(gc_urban_under5wasting$html$chart, file = "gc_urban_under5wasting.html")

###################################################################################

Map: Urbanization Vs Under-5 Stunting

Plot Map for Urban Population (Color coded Avg % Of Urban Population) With Under-5 Stunting % Population

Color Variation: Current Stunting %
Hover Text : Stunting % And Average Urban Population (avgurbval)

However we do find that the stunting in developing /urban countries is increaing at alarming rate.

gnrnew$prev_stunting_current <- sub("^$", "NA", gnrnew$prev_stunting_current)

gc_urban_under5stunting <- gvisGeoChart(gnrnew, "country", "prev_stunting_current", hovervar = "avgurbval", options = list(projection = "kavrayskiy-vii", 
    height = 700, width = 500, colorAxis = "{colors:['yellow', 'red']}"))
plot(gc_urban_under5stunting)

cat(gc_urban_under5stunting$html$chart, file = "gc_urban_under5stunting.html")

BubbleChart: Low Birth Wt. Vs Anaemic Mothers With Urban Factor

Bubble Chart that depicts countries in bubbles with data of

Low Birth Weight (Y) Vs Woman % Rate for those Anaemic and within Reproductive age (X).
Size Of Bubble : Vitamin A Deficiency
Color Variance : Urban Factor(H Highly Urban, D : Developing/Mediocre Urban, R : Rural )

# Bubble Chart Depicting the Low Birth Weight Vs Anaemic % of Reproductive Age Women Population With Vitamin A deficiency For Urban Developing And
# Rural Countries.


bub_lbw_AnaemicVitA <- gvisBubbleChart(gnrnew, idvar = "country", xvar = "WRAanaemia_RATE", yvar = "LBW", colorvar = "UrbanFactor", sizevar = "prevalence_vita", 
    options = list(height = 1200, width = 1250, hAxis = "{minValue:0, maxValue:100}"))
plot(bub_lbw_AnaemicVitA)

cat(bub_lbw_AnaemicVitA$html$chart, file = "bub_lbw_AnaemicVitA.html")

###################################################################

Analysis

Data summaries

Subsetting dataset for relevant variables under study.

gnrnew.ana <- gnrnew %>% select(country, subregionUN, prev_stunting_current, prev_wasting, prev_u5overweight, LBW, WRAanaemia_RATE, prevalence_vita, Class_IodineNutrition, 
    avgurbval, avgfeduval, avgfempval, avgwaterval, UrbanFactor)


gnrnew.ana$prev_u5overweight <- as.numeric(gnrnew.ana$prev_u5overweight)
gnrnew.ana$LBW <- as.numeric(gnrnew.ana$LBW)

gnrnew.ana$prev_stunting_current <- as.numeric(gnrnew.ana$prev_stunting_current)
gnrnew.ana$WRAanaemia_RATE <- as.numeric(gnrnew.ana$WRAanaemia_RATE)
gnrnew.ana$prev_wasting <- as.numeric(gnrnew.ana$prev_wasting)

summary(gnrnew.ana$prev_wasting)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   2.000   5.000   6.248   9.000  23.000      63

summary(gnrnew.ana$prev_u5overweight)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   4.000   6.000   7.256  10.000  23.000      67

summary(gnrnew.ana$LBW)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    4.20    6.90    9.70   10.50   11.92   34.70      12

summary(gnrnew.ana$WRAanaemia_RATE)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   12.00   20.00   25.00   28.38   33.00   58.00       8

summary(gnrnew.ana$avgfeduval)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   5.844  45.270  66.880  62.920  87.710 100.000      89

Plots

Plot : Under-5 Wasting Vs Urbanization : Linear Regression

######################################################################### 

ggplot(gnrnew.ana, aes(x = avgurbval, y = prev_wasting)) + geom_point(aes(color = WRAanaemia_RATE)) + geom_smooth(method = "lm") + ggtitle("Under-5 Wasting Vs Urbanization") + 
    labs(x = "Urban Population %", y = "Under - 5 Wasting")

m_wasteurban <- lm(gnrnew.ana$prev_wasting ~ gnrnew.ana$avgurbval)
summary(m_wasteurban)

Call: lm(formula = gnrnew.ana$prev_wasting ~ gnrnew.ana$avgurbval)

Residuals: Min 1Q Median 3Q Max -7.3884 -2.6790 -0.7562 1.5159 18.6735

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.71474 0.97116 12.06 < 2e-16 gnrnew.ana$avgurbval -0.10876 0.01771 -6.14 9.76e-09 — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1

Residual standard error: 4.406 on 127 degrees of freedom (63 observations deleted due to missingness) Multiple R-squared: 0.2289, Adjusted R-squared: 0.2228 F-statistic: 37.7 on 1 and 127 DF, p-value: 9.763e-09

linear equation:

prev_wasting = 11.71474 -0.10876 * avgurbval Strong and negative linear relationship between under 5 wasting and urbaniation.

P value is much less and obvious relationship between impoverished countries and malnutrition

qqnorm(m_wasteurban$residuals)
qqline(m_wasteurban$residuals)

The normal probability plot shows heavy rightskewness

Plot : Under-5 OverWeight Vs Urbanization : Linear Regression

We observe that female employment in urbanized countries might have an affect on the overweight factor.

ggplot(gnrnew.ana, aes(x = avgurbval, y = prev_u5overweight)) + geom_point(aes(color = avgfempval)) + geom_smooth(method = "lm") + ggtitle("Under-5 Overweight Vs Urbanization") + 
    labs(x = "Urban Population %", y = "Under - 5 Overweight")

m_overwturban <- lm(gnrnew.ana$prev_u5overweight ~ gnrnew.ana$avgurbval)
summary(m_overwturban)

Call: lm(formula = gnrnew.ana$prev_u5overweight ~ gnrnew.ana$avgurbval)

Residuals: Min 1Q Median 3Q Max -7.334 -3.186 -1.061 2.291 15.527

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.78000 1.02815 4.649 8.45e-06 * gnrnew.ana$avgurbval 0.04959 0.01883 2.633 0.00955 — Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1

Residual standard error: 4.649 on 123 degrees of freedom (67 observations deleted due to missingness) Multiple R-squared: 0.05336, Adjusted R-squared: 0.04566 F-statistic: 6.933 on 1 and 123 DF, p-value: 0.009546 equation : prev_u5overweight = 4.78000 + 0.04959 * avgurbval

p value is 0.009546 which is much less than <0.05

Which proves that there is a relationship between OVerweight and Urbanization.

qqnorm(m_overwturban$residuals)
qqline(m_overwturban$residuals)

The normal probability plot shows heavy rightskewness

Multiple Regressions With Female Employment added to the model

Lets find if the female employment affects the overweight factor along with urbanization

m_overwturbanfemp <- lm(gnrnew.ana$prev_u5overweight ~ gnrnew.ana$avgurbval + gnrnew.ana$avgfempval)

summary(m_overwturbanfemp)

Call: lm(formula = gnrnew.ana$prev_u5overweight ~ gnrnew.ana$avgurbval + gnrnew.ana$avgfempval)

Residuals: Min 1Q Median 3Q Max -6.9799 -3.0339 -0.6331 2.9565 13.8813

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.03266 1.47260 2.738 0.00785 ** gnrnew.ana$avgurbval 0.03419 0.03558 0.961 0.33997 gnrnew.ana$avgfempval 0.02244 0.03075 0.730 0.46810
— Signif. codes: 0 ‘’ 0.001 ’’ 0.01 ’’ 0.05 ‘.’ 0.1 ‘’ 1

Residual standard error: 4.279 on 69 degrees of freedom (120 observations deleted due to missingness) Multiple R-squared: 0.07397, Adjusted R-squared: 0.04712 F-statistic: 2.756 on 2 and 69 DF, p-value: 0.07057

equation : prev_u5overweight = 4.03266 + 0.03419 * avgurbval + 0.02244 * avgfempval

p value is 0.00040085 which is much less than <0.05

Which proves that there is a relationship between OVerweight and Urbanization

We observe that there is very less significance 0.02244 * avgfempval with every 1 unit change in female education.

qqnorm(m_overwturbanfemp$residuals)
qqline(m_overwturbanfemp$residuals)

We find these tailed in nirmal probability plot.And not a normal distribution.

Plot : Women Anaemic Vs Urbanization

Plotting For Countries With female employment as factor

While this plot should have shown a negative sharply linear relation, we find that the Anemic rate is increasing at an alarming rate, and is almost more than half of that of the impoverished countries

gnrnew.ana$prev_stunting_current <- as.numeric(gnrnew.ana$prev_stunting_current)

ggplot(gnrnew.ana, aes(x = avgurbval, y = WRAanaemia_RATE)) + geom_point(aes(color = avgfempval)) + geom_smooth(method = "lm") + ggtitle("Women Anaemic Vs Urbanization") + 
    labs(x = "Urban Population %", y = "Anaemic")

Plot : Low Birth Weight % Vs Female Education, Faceted By Urban Factor ,By Working Female Population UrbanFactor:

R : Rural
D : Developing
U : Highly Urban

ggplot(data = gnrnew.ana, aes(y = LBW, x = avgfeduval)) + geom_point(aes(color = gnrnew.ana$avgfempval)) + facet_grid(~gnrnew.ana$UrbanFactor) + ggtitle("Low Birth Weight % Vs Female Education, Faceted By Urban Factor ,By Working Female Population") + 
    scale_x_continuous(name = "Female Education Population %") + scale_y_continuous(name = "Low Birth Weight %")

# Frequency Plot

ggplot(gnrnew.ana, aes(prev_u5overweight, colour = UrbanFactor)) + geom_freqpoly() + ggtitle("Frequencey Polygon : Under -5 OverWeight : UrbanFactor Based") + 
    scale_x_continuous(name = "Under -5 OverWeight")

We do see that the female education promotes better birth weight.ie.e less % of Low Birth Weight

From the frequency plot, we find that at father right (more % of under-5 overweight) the Urban /developing countries show prominence

Inferences

To ee the significance and correlation among the various factors, we plot a correlation matrix .

cor(gnrnew.ana$prev_wasting, gnrnew.ana$avgurbval)

[1] NA

library(corrplot)
cormat <- cor(gnrnew.ana[, c("prev_wasting", "prev_stunting_current", "LBW", "WRAanaemia_RATE", "prevalence_vita", "prev_u5overweight", "avgurbval", "avgfeduval", 
    "avgfempval", "avgwaterval")], use = "pairwise.complete.obs")

corrplot(cormat, method = "color", addCoef.col = "black")

We plot a corelation matrix to find the correlation among variables.

We do see correlation among under5-overweight and urbanization, there are factors like improved water sources availbility %(although strangely), female employment that add a bit of significance. With working mother not able to provide as much attention to under -5 kids whether in urban countries could be a contributing factor.

We also found under5-stunting presence in urban countries . With the corelation matrix showing a negative corelation, and also observed from the plot, the linearity is not very strongly negative may be suggestive of the increasing problem at the moment.

Also from the plot, thereis an obvious relationship between low birth weight and under-5 wasting, which is also strongly linked to anaemic mother and prevalence of Vit A deficiency among children

Conclusion

Although, it could not be established fullfledged about other factors (improved / safe water resources , female employement) that affect over nutrition, the relationship does exist to some extent and need to be further analyzed with advance models.

The relationships although weak, would need further analysis and more data over the years for all countries. The presence lot of NA values have influenced the analysis.

We did find an obvious relation among low birth weight , under-5 stunting and rural countries, under 5-Over-weight in urban areas is a cause of concern. There could be other factors that may be affecting the development of children under 5