When I had my daughter, my delivery was relatively easy. I had no complications and I think it went as smoothly as it could have gone. Not long after I had my daughter, my sister-in-law, who lives in Nigeria, had my nephew. While talking to her about her child-birth experience, I couldn’t help but compare it to mine. They were totally different experiences. When we spoke further, I realized that in Nigeria, delivering the child is the “easy part”. Ensuring that that child survives past 5 years old is the difficult part. Her fear that her son may not make it due to diseases that we don’t even think about in the US opened my eyes, and although more children are living past five, it is still a heavy concern. This prompted me to take to look at the the disease that are killing children around the world.
The data I will be utilizing was obtained from the UNICEF website. It provides the percentage of infant death by country. Each column represents a COD (cause of death) and each row or case is the region of the country. I will be focusing on the under 5 years data, specifically the intrapartum column(during childbirth), preterm, as well as measles and pertussis. Lastly we will look at the rate of children who died due to accidents. We will explore this data for Nigeria and the US for 2000 and 2015. The study that was conducted to obtain this data was an observational study done by sampling. Being that this data encompasses children under five worldwide, I would say that this data can be used to generalize for the population (all children under five). This table is not one for causality; rather is gives a rate of occurrence in the past.
data <- read.csv("https://raw.githubusercontent.com/komotunde/IS606/master/Infant%20Mortality.csv")
inf.mort <- subset(data, Country == 'Nigeria' | Country == 'United States of America')
#create subset for the countries and years we are exploring
View(inf.mort)
inf.mort <- inf.mort[c(2, 3, 4, 6, 12, 14, 15)]
View(inf.mort)
#first we will create a 2000 subset and a 2015 subset
inf.mort2000 <- subset(inf.mort, Year == 2000)
inf.mort2015 <- subset(inf.mort, Year == 2015)
require(tidyr)
## Loading required package: tidyr
require(ggplot2)
## Loading required package: ggplot2
inf.mort2000 <- gather(inf.mort2000, "COD", "Percent", 3:7)
View(inf.mort2000)
ggplot(inf.mort2000, aes(y=Percent, x=Country, fill=Country))+
facet_wrap(~COD) + ggtitle("Infant Mortality 2000")+
geom_bar(stat='identity', position = 'dodge')
The above plots show us that for 2000, Nigeria had a higher percentage of infant deaths in all categories except for Preterm deaths and deaths related to injury. There were no deaths in the US for 2000 due to malaria or measles. We will now take a look at the data for 2015.
inf.mort2015 <- gather(inf.mort2015, "COD", "Percent", 3:7)
View(inf.mort2015)
ggplot(inf.mort2015, aes(y=Percent, x=Country, fill=Country))+
facet_wrap(~COD) + ggtitle("Infant Mortality 2015")+
geom_bar(stat='identity', position = 'dodge')
Two things I noticed from this plot is the significant decline in deaths from measles in Nigeria. There was also a decrease in death from malaria in Nigeria.
Also, as I am preparing to perform my I would like to check the conditions for inference to ensure that my data meets the requirements. Based on my understanding, since my data is in percent as well the the fact that we will be dealing with two separate cases, I will be performing inference for a two populations (US and Nigeria). I am having a little difficulty picture how to set up the test as my data set does not include the number of participants, just percentages but I will give it a try, but first to check our conditions.
We know this is true due to how the data was collected. For this particular test, I will assume that our n is 1000 and figure out how many died from each cost based on the percents given. I would like to focus on the hypothesis test for preterm deaths and injury. Since there are no individuals who passed from malaria in the US.
For this hypothesis test, success will be considered a child who suffered from pre-term infant deaths.
n <- 1000
nigpreterm.n <- 117
uspreterm.n <- 268
p.deathUS <- 268/1000
p.deathNig <- 117/100
NUL <- 0
point.estimate <- p.deathUS - p.deathNig #-.902
SE <- (sqrt((p.deathUS*1-p.deathUS)/268)+ (p.deathNig*(1-p.deathNig)/117)) #-.0017
Z <- point.estimate - NUL #-.902
#since this is a two sided test, we will doube our z-score of .1788 to get a p-value of .3576. Since our p-value is greater than .05, we do NOT reject our null meaning that there is NO significant difference between infant mortality in 2015 due to preterm death.
We will now perform the same testing for infant mortality due to injury.
n <- 1000
nigpreterm.n <- 51
uspreterm.n <- 125
p.deathUSinj <- 51/1000
p.deathNiginj <- 125/1000
NUL <- 0
point.estimateinj <- p.deathUSinj - p.deathNiginj #-.074
SE <- (sqrt((p.deathUSinj*1-p.deathUSinj)/268)+ (p.deathNiginj*(1-p.deathNiginj)/117)) #-.000935
Z <- point.estimateinj - NUL #-.074
#From the Z table, we get a p-value of .9442 which is greater than .05. Once again, we fail to reject the null hypothesis, indicating that there is no significant difference in injury that resulted in death.
I realized about half way through my data that I could have picked a better data set. I was honestly expecting that there would be significant difference between each of the categories that I tested but I found that there was none. This indicates that for 2015, in Nigeria and United States, there was no significant difference in the number of infant deaths due to injury or preterm complications.