First let’s merge the data.

gdp_data[, c("V2", "V3", "V4", "V5", "V6", "V7")] <- 
  lapply(gdp_data[, c("V2", "V3", "V4", "V5", "V6", "V7")], function(x) as.numeric(gsub(",", "", x)))

## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion
## Warning in FUN(X[[i]], ...): NAs introduced by coercion

gdp_data$Average_GDP<-rowMeans(gdp_data[, c("V2", "V3", "V4", "V5", "V6", "V7")], na.rm = TRUE)
joined_data <- clean_data %>%
  left_join( 
    gdp_data %>% select(V1, Average_GDP),
    by = c("countries_and_areas" = "V1"))
clean_joined_data<- na.omit(joined_data)

# I used chat-gpt to help me join my two data sets!

Question 1: Earlier when working with this question, my main objective was to identify the relationship between the percentage of women who can give birth in a health facility in a given country and the maternal mortality ratio. The data that I used to answer this question was taken from UNICEF’s (United Nations International Children’s Emergency Fund) State of The World’s Data Set published in October 2019. I especially focused on the data collected for “Maternal and Newborn Health” conducted from 2013 to 2018. In my attempt to answer this question, I created a linear regression model that focused on my two variables of interest as represented in the data (1) “institutional delivery” or “delivery_care_institutional” (i.e the percentage of women aged 15–49 who gave birth in a health facility) and (2) “maternal mortality ratio” (i.e the number of deaths of women from pregnancy-related causes per 100,000 live births). When running this model, I was able to find an inferred inverse relationship between the two variables, in other words, there is an association that a country’s maternal mortality ratio decreases when a higher percentage of women can give birth in a health facility. For gaining a better inference, I can identify a confounding variable that may have an influence on both of my variables chosen and run a multivariate analysis.

Question 2: The multiple regression model that I would like to estimate is: Maternal Mortality Ratioi = 𝛽0 + 𝛽1(Institutional Delivery)i1 + 𝛽2(Country_ GDP)i2 + ε. I think this model proposed would be a good representation of the data generating process because, as briefly mentioned in the last question, the added variable (a country’s GDP) influences both “institutional delivery” and “maternal mortality ratio”. In essence, countries with a higher GDP may be able to allocate more funding and other resources to institutions such as hospitals. The better equipped a hospital or health facility is (with medical equipment, doctors, etc.), the more likely it is that preventive measures can be implemented to reduce serious harm or death (in this instance, death due to pregnancy related causes).

Question 3: Multi-collinearity occurs when two or more independent variables are highly correlated with each other. If understanding the term correctly, there may be some slight reason for concern (at least at surface value) due to, as hypothesized, the assumption that with a higher GDP comes the less likely-hood of maternal death. (“richer” countries = more rescources/more spending).

Question 4: Run Regression Model

model1 <- lm(maternal_mortality_ratio_2017 ~ delivery_care_institutional + Average_GDP, data = clean_joined_data)

summary(model1)

## 
## Call:
## lm(formula = maternal_mortality_ratio_2017 ~ delivery_care_institutional + 
##     Average_GDP, data = clean_joined_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -357.32 -141.66  -10.05  102.60  774.33 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 771.20416  108.76237   7.091 3.91e-09 ***
## delivery_care_institutional  -5.29167    1.64038  -3.226  0.00219 ** 
## Average_GDP                  -0.03090    0.01356  -2.278  0.02692 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 211.3 on 51 degrees of freedom
## Multiple R-squared:  0.4064, Adjusted R-squared:  0.3831 
## F-statistic: 17.46 on 2 and 51 DF,  p-value: 1.676e-06

#I will now summarize the data using stargazer:

stargazer(model1, 
          type = "text",
          title = "Regression Results",
          dep.var.labels = "Maternal Mortality Ratio 2017",
          covariate.labels = c("Intercept", "Delivery Care Institutional", "Countries GDP"),
          omit.stat = c("f", "ser"),
          report = "vc*st",
          star.cutoffs = c(0.05, 0.01, 0.001))

## 
## Regression Results
## =========================================================
##                                  Dependent variable:     
##                             -----------------------------
##                             Maternal Mortality Ratio 2017
## ---------------------------------------------------------
## Intercept                             -5.292**           
##                                        (1.640)           
##                                      t = -3.226          
##                                                          
## Delivery Care Institutional            -0.031*           
##                                        (0.014)           
##                                      t = -2.278          
##                                                          
## Countries GDP                        771.204***          
##                                       (108.762)          
##                                       t = 7.091          
##                                                          
## ---------------------------------------------------------
## Observations                             54              
## R2                                      0.406            
## Adjusted R2                             0.383            
## =========================================================
## Note:                       *p<0.05; **p<0.01; ***p<0.001

Question 5: The coefficient estimate generally tells you the change in the response variable for every one-unit increase in the predictor variable(s). In the case of my data, something a bit surprising is happening! My coefficient estimate for my predictor variable “Countries GDP” indicates that for every one-unit increase in GDP, the maternal mortality ratio increases by 771.024 (i.e. the higher a country’s GDP the higher the maternal mortality ratio is). My coefficient estimate for my predictor variable “Delivery Care Institutional” seems to indicate that for every unit (or percent) increase in a woman being able to give birth in a health facility, the maternal mortality ratio decreases by 0.031 which does not appear to be significant at all. My R2 value is 0.406 or 40.6% meaning that 40.6% of the maternal mortality ratio is explained by my model. I’m not exactly sure of how good of a fit this (I will say it is pretty weak).

Question 6: Build Confidence Interval

confint(model1, level = 0.95)

##                                   2.5 %        97.5 %
## (Intercept)                 552.8545765 989.553733887
## delivery_care_institutional  -8.5848691  -1.998471210
## Average_GDP                  -0.0581205  -0.003672194

Question 6 cont: For “Delivery Care Institutional”, the confidence interval is entirley negative.This indicates a negative relationship between delivery care and maternal mortality (for every one-unit increase in the percentage of deliveries in hospitals, the maternal mortality ratio decreases). This predictor variable is statistically significant, as 0 does not lie within the interval. For Average GDP (i.e A Country’s GDP), the confidence interval is also negative. This indicates a negative relationship between GDP and maternal mortality (i.e for every one-unit increase in for a country’s GDP, the maternal mortality ratio decreases, although at a smaller magnitude). Since 0 does lie within my interval, this predcitor variable is not statistically significant.

Question 7: Run an F-test

install.packages("car")

## Installing package into '/Users/semihaaa/Library/R/arm64/4.4/library'
## (as 'lib' is unspecified)

## 
## The downloaded binary packages are in
##  /var/folders/2l/kt9lbnv54px1bthtt8ntyqlw0000gn/T//Rtmp5SCcxU/downloaded_packages

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:openintro':
## 
##     densityPlot

## The following object is masked from 'package:dplyr':
## 
##     recode

# Perform F-test for joint significance
linearHypothesis(model1, c("delivery_care_institutional=0", "Average_GDP=0"))

## 
## Linear hypothesis test:
## delivery_care_institutional = 0
## Average_GDP = 0
## 
## Model 1: restricted model
## Model 2: maternal_mortality_ratio_2017 ~ delivery_care_institutional + 
##     Average_GDP
## 
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1     53 3834602                                  
## 2     51 2276231  2   1558371 17.458 1.676e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# I had trouble performing the F-test. I looked at the textbook reference you provided and copied the code (I think?) but I kept on getting an error code. It may be a mistake on my part though

Question 8: Residuals & Density Distributions

#Building residual plot
lm(maternal_mortality_ratio_2017 ~ delivery_care_institutional + Average_GDP, data = clean_joined_data) %>%
  ggplot(aes(.fitted, .resid)) +
  geom_point() +
  geom_hline(yintercept = 0) +
  labs(
    title = "Residual Plot",
    x = "Predicted Maternal Mortality Ratio",
    y = "Residuals (Errors)"
  )

#Building the Density Plot
model1 <- lm(maternal_mortality_ratio_2017 ~ delivery_care_institutional + Average_GDP, data = clean_joined_data)

# Create the density plot for residuals
ggplot(data = clean_joined_data, aes(x = residuals(model1))) +
  geom_density(fill = "slateblue3", alpha = 0.5) +
  labs(title = "Density Plot of Residuals", x = "Residuals", y = "Density") +
  theme_minimal()

# Question 8 cont: The error terms in my density distribution obtained from my residuals does not appear to be normally distributed, but rather a bit skewed to the left. This contrasts the shape of my error terms distribution in my simple regression analysis, which appears to have its error terms normally distributed (although also a bit skewed to the left).

Question 9: From the multiple regression analysis I have conducted, it is a bit unclear what we can infer. My initial hypothesis was that with a country having a “higher” GDP, the maternal mortality ratio for said country would decrease due to more resources being allocated to institutions such as hospitals in addition to a woman being able to give birth in a health facility (as was proven independently in my simple regression analysis). Although this hypothesis was kind of I still need to conduct the “F-test” from step seven proven true in my confidence interval, the Stargazer summary of my model indicated the complete opposite. I’ll be honest and say I am unsure how to explain these results. I think the closest “theory” I can come to is, perhaps, that the country’s GDP was too close in range for there to be any (noticeable) difference. There may as well have been an error in how I joined the two data sets, or where I originally sourced the data for the country’s GDP (although I doubt it).

Question 10: Moving forward, there are several directions that this project may go in. I could theoretically leave it as it is, but I do not feel that will benefit my continued learning and growth. I think the best direction forward is to inspect the work I’ve done so far and look at where I went “wrong” or had any misunderstanding. I think going back and using the textbook provided will be useful in making that happen. I would also like to look at published articles/papers/literature on my question(s) of interest and how they modeled their data and interpreted their findings, although a bit dated (published in 2005), I found a paper published in The Lancet that investigates a similar question as the one’s posed in my project(s) (It’s titled “Where giving birth is a forecast of death: maternal mortality in four districts of Afghanistan, 1999–2002” by Bartlett et.al)

Project Week # 3

Semiha Salami

2024-12-10

First let’s merge the data.

Question 4: Run Regression Model

Question 6: Build Confidence Interval

Question 7: Run an F-test

Question 8: Residuals & Density Distributions