R. H. Lowe Jr., M.S. | February 11, 2019 | Dem 7283 | University of Texas at San Antonio

Research Purpose

The purpose of this assignment is to develop a model that estimates the probability of a Ghanaian woman reporting an unintentional pregnancy - dependent on her educational level, marital status and age. I also seek to examine whether this probability increases or decreases if the subject is currently using some sort of contraceptive. Data was collected from the MICS 4 Questionnaire for Individual Women in 2011, by UNICEF and is specific to Ghanaian women aged between 15–49.

Before running the models, all required packages are installed:

library(car)
## Loading required package: carData
library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
## 
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
## 
##     dotchart
library(questionr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

All variables are recoded accordingly, starting with the DV (pregnancy intentions). Weights (wmweight) and PSU (HH1) are included prior to telling R the survey design:

library(haven)
wm <- read_sav("Desktop/Data/Ghana MICS 2011 SPSS Datasets/wm.sav")

#Pregnancy Intentions: "When you got pregnant, did you want to get pregnant at that time?"
wm$pregintnt<-Recode(wm$DB1,recodes="1=0;2=1;9=NA")

#Age: "How old are you?"
wm$agec<-Recode(wm$WB2CA,recodes="15:19='15-19';20:29='20-29';30:39='30-39';40:49='40-49';else=NA",as.factor=T)
wm$agec<-relevel(wm$agec,ref='40-49')

#Educational Attainment:"What is the highest level of school you attended?"
wm$educ<-Recode(wm$WB4,recodes="0='1Pre';1='2Prim';2='3Sec';3='4High';4:6=NA", as.factor=T)
wm$educ<-relevel(wm$educ,ref='4High')

#Contraceptive Use:"Are you currently doing something or using any method to delay or avoid getting pregnant?"
wm$cntrcpt<-Recode(wm$CP2,recodes="1='1Cntrcpt';2='2NoCntrcpt';8:9=NA",as.factor=T)

#Marriage:"Are you currently married or living together with a man as if married?"
wm$marrg<-Recode(wm$MA1,recodes="1='1Marrd';2='2Cohab';3='3Sngl';9=NA",as.factor=T)

sub<-wm%>%
  select(pregintnt,agec,educ,cntrcpt,marrg,wmweight,HH1) %>%
  filter( complete.cases(.))

options(survey.lonely.psu = "adjust")
des<-svydesign(ids=~HH1,weights=~wmweight, data=sub)

The percentage of unintended preganancies by age is plotted and a survey-corrected chi-square test for independence is performed. Results indicate a signficant relationship between unintentional pregnancy and age, F (2.92, 1802.2) = 7.08), p < .05

library(ggplot2)
cat<-svyby(formula = ~pregintnt, by = ~agec, design = des, FUN = svymean, na.rm=T)
svychisq(~pregintnt+agec, design = des)
## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~pregintnt + agec, design = des)
## F = 7.0848, ndf = 2.9208, ddf = 1802.2000, p-value = 0.0001169
qplot(x=cat$agec,y=cat$pregintnt, data=cat ,xlab="Age", ylab="%  Pregnancy Intentions" )+
geom_errorbar(aes(x=agec, ymin=pregintnt-1.96*se,ymax= pregintnt+1.96*se), width=.25)+
ggtitle(label = "% of Ghanaian Women with Unintended Pregnancies by Age")

The percentage of unintended preganancies by educational attainment is plotted and another survey-corrected chi-square test for independence performed. Results indicate that there is a significant relationship between unintended pregnancies and educational attainment, F (2.55, 1575) = 3.0584), p < .05

library(ggplot2)
cat<-svyby(formula = ~pregintnt, by = ~educ, design = des, FUN = svymean, na.rm=T)
svychisq(~pregintnt+educ, design = des)
## 
##  Pearson's X^2: Rao & Scott adjustment
## 
## data:  svychisq(~pregintnt + educ, design = des)
## F = 3.0584, ndf = 2.5528, ddf = 1575.0000, p-value = 0.03489
qplot(x=cat$educ,y=cat$pregintnt, data=cat ,xlab="Education", ylab="%  Pregnancy Intentions" )+
geom_errorbar(aes(x=educ, ymin=pregintnt-1.96*se,ymax= pregintnt+1.96*se), width=.25)+
ggtitle(label = "% of Ghanaian Women with Unintended Pregnancies by Education")

Another cross tabulation is performed with both covariates:

catdog<-svyby(formula = ~pregintnt, by = ~agec+educ, design = des, FUN = svymean, na.rm=T)
catdog
##              agec  educ pregintnt         se
## 40-49.4High 40-49 4High 0.0000000 0.00000000
## 15-19.4High 15-19 4High 1.0000000 0.00000000
## 20-29.4High 20-29 4High 0.4296271 0.06602270
## 30-39.4High 30-39 4High 0.3164448 0.07577250
## 15-19.1Pre  15-19  1Pre 1.0000000 0.00000000
## 20-29.1Pre  20-29  1Pre 0.6519622 0.28341094
## 30-39.1Pre  30-39  1Pre 0.6443783 0.32433651
## 40-49.2Prim 40-49 2Prim 0.6148152 0.09270836
## 15-19.2Prim 15-19 2Prim 0.7408586 0.08687212
## 20-29.2Prim 20-29 2Prim 0.5062683 0.04490183
## 30-39.2Prim 30-39 2Prim 0.4371690 0.04363962
## 40-49.3Sec  40-49  3Sec 0.5159563 0.10379197
## 15-19.3Sec  15-19  3Sec 0.7577802 0.06695439
## 20-29.3Sec  20-29  3Sec 0.5394866 0.03658701
## 30-39.3Sec  30-39  3Sec 0.4763527 0.04716227

Here, I run Logit & Probit Models, both of which include all covariates in the intial survey design. Odds Ratios and Confidence intervals are performed as well:

fit.logit<-svyglm(pregintnt~agec+educ+cntrcpt+marrg,
                  design= des,
                  family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
exp(coef(fit.logit))#odds ratios
##       (Intercept)         agec15-19         agec20-29         agec30-39 
##         0.7557847         1.6217066         0.7188254         0.6664974 
##          educ1Pre         educ2Prim          educ3Sec cntrcpt2NoCntrcpt 
##         4.7259128         1.3710198         1.5127027         0.7160433 
##       marrg2Cohab        marrg3Sngl 
##         2.3729842         2.7594222
exp(confint(fit.logit)) #confidence intervals for odds ratios
##                       2.5 %     97.5 %
## (Intercept)       0.3604891  1.5845432
## agec15-19         0.7077647  3.7158284
## agec20-29         0.3781836  1.3662939
## agec30-39         0.3530257  1.2583188
## educ1Pre          0.9304392 24.0039886
## educ2Prim         0.8193033  2.2942605
## educ3Sec          0.9293711  2.4621698
## cntrcpt2NoCntrcpt 0.5165317  0.9926168
## marrg2Cohab       1.6979681  3.3163485
## marrg3Sngl        1.7064306  4.4621860
#probit model
fit.probit<-svyglm(pregintnt~agec+educ+cntrcpt+marrg,
                   design=des,
                   family=binomial(link= "probit"))
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
exp(confint(fit.probit)) #confidence intervals for odds ratios
##                       2.5 %    97.5 %
## (Intercept)       0.5291179 1.3232040
## agec15-19         0.8127579 2.2292783
## agec20-29         0.5496716 1.2194245
## agec30-39         0.5265041 1.1583438
## educ1Pre          1.0459086 6.7854585
## educ2Prim         0.8853588 1.6638537
## educ3Sec          0.9562822 1.7383989
## cntrcpt2NoCntrcpt 0.6662039 0.9956586
## marrg2Cohab       1.3921459 2.1050337
## marrg3Sngl        1.3965949 2.5231537
myexp<-function(x){exp(x)}

stargazer(fit.logit,
          fit.probit,
          type = "text",
          style="demography",
          covariate.labels=c("Age 15-19" ,"Age 20-29","Age 30-39","Preschool", "Primary School", "Secondary Education","Contraceptive                  Use", "Living with Male, Unmarried", "Single"),
          ci=T, column.sep.width = "10pt",apply.coef = myexp)
## 
## -----------------------------------------------------------
##                                        pregintnt           
##                             survey-weighted survey-weighted
##                                logistic         probit     
##                                 Model 1         Model 2    
## -----------------------------------------------------------
## Age 15-19                      1.622***        1.346***    
##                             (0.793, 2.451)  (0.842, 1.851) 
## Age 20-29                       0.719*         0.819***    
##                             (0.077, 1.361)  (0.420, 1.217) 
## Age 30-39                       0.666*         0.781***    
##                             (0.031, 1.302)  (0.387, 1.175) 
## Preschool                      4.726***        2.664***    
##                             (3.101, 6.351)  (1.729, 3.599) 
## Primary School                 1.371***        1.214***    
##                             (0.856, 1.886)  (0.898, 1.529) 
## Secondary Education            1.513***        1.289***    
##                             (1.026, 2.000)  (0.991, 1.588) 
## Contraceptive Use              0.716***        0.814***    
##                             (0.389, 1.043)  (0.614, 1.015) 
## Living with Male, Unmarried    2.373***        1.712***    
##                             (2.038, 2.708)  (1.505, 1.919) 
## Single                         2.759***        1.877***    
##                             (2.279, 3.240)  (1.581, 2.173) 
## Constant                        0.756*         0.837***    
##                             (0.015, 1.496)  (0.378, 1.295) 
## N                                1,291           1,291     
## Log Likelihood                 -786.262        -786.233    
## AIC                            1,592.525       1,592.466   
## -----------------------------------------------------------
## *p < .05; **p < .01; ***p < .001

The models mutually suggest that women aged 15-19 had an increased chance of reporting an unintended pregnancy compared to their middle aged counterparts (reference group = Age 40-49). Similarly, Ghanaian women with just a primary school education had an increased chance in reporting an unintended pregnancy as well. Interestingly, the probit model has stronger significance levels than the logit model across age cagtegories and the constant. Next, I retrieve marginal effects (and plot them) to determine the rate of change between the dependent and independent variables.

#Logit marginal effects
log.marg<-coef(fit.logit)*mean(dlogis(predict(fit.logit)), na.rm=T)

#Progit marginal effects
prob.marg<-coef(fit.probit)*mean(dnorm(predict(fit.probit)), na.rm=T)


effects<-data.frame(effect=c(log.marg, prob.marg),
                    term=rep(names(log.marg),2),
                    model=c(rep("logit", length(log.marg)),rep("probit", length(log.marg)))
                    )

effects%>%
  ggplot(aes(x=term, y=effect))+geom_point(aes(color=model, group=model, shape=model),
  position=position_jitterdodge(jitter.width = 1),
             size=2)+
  ylab("Marginal Effect")+
  xlab("Model Term")+
  geom_abline(intercept = 0, slope=0)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  ggtitle(label = "Comparison of marginal effects in Logit and Probit models")

We see that marginal effects are comparable between the logit and probit model, though large distinctions exist with the first two age categories of 15-19 and 20-29.

#get a series of predicted probabilites for different "types" of people for each model
#expand.grid will generate all possible combinations of values you specify
dat<-expand.grid(educ=levels(wm$educ), agec=levels(wm$agec),marrg=levels(wm$marrg),cntrcpt=levels(wm$cntrcpt))

#You MAY need to get rid of impossible cases here

#generate the fitted values
fit<-predict(fit.logit, newdat=dat,type="response")
fitp<-predict(fit.probit, newdat=dat,type="response")
#add the values to the fake data
dat$fitted.prob.lrm<-round(fit, 3)
dat$fitted.prob.pro<-round(fitp, 3)

#Print the fitted probabilities for the first 20 cases
knitr::kable(head(dat, n=20))
educ agec marrg cntrcpt fitted.prob.lrm fitted.prob.pro
4High 40-49 1Marrd 1Cntrcpt 0.430 0.429
1Pre 40-49 1Marrd 1Cntrcpt 0.781 0.789
2Prim 40-49 1Marrd 1Cntrcpt 0.509 0.506
3Sec 40-49 1Marrd 1Cntrcpt 0.533 0.530
4High 15-19 1Marrd 1Cntrcpt 0.551 0.547
1Pre 15-19 1Marrd 1Cntrcpt 0.853 0.864
2Prim 15-19 1Marrd 1Cntrcpt 0.627 0.623
3Sec 15-19 1Marrd 1Cntrcpt 0.650 0.645
4High 20-29 1Marrd 1Cntrcpt 0.352 0.353
1Pre 20-29 1Marrd 1Cntrcpt 0.720 0.726
2Prim 20-29 1Marrd 1Cntrcpt 0.427 0.427
3Sec 20-29 1Marrd 1Cntrcpt 0.451 0.451
4High 30-39 1Marrd 1Cntrcpt 0.335 0.335
1Pre 30-39 1Marrd 1Cntrcpt 0.704 0.710
2Prim 30-39 1Marrd 1Cntrcpt 0.409 0.408
3Sec 30-39 1Marrd 1Cntrcpt 0.432 0.432
4High 40-49 2Cohab 1Cntrcpt 0.642 0.640
1Pre 40-49 2Cohab 1Cntrcpt 0.894 0.910
2Prim 40-49 2Cohab 1Cntrcpt 0.711 0.710
3Sec 40-49 2Cohab 1Cntrcpt 0.731 0.730

There are several alarming findings in this analysis. Specifically, I specify the probability of reporting an unintended pregnancy for pre-school educated, cohabitiating woman aged between 40-49 years, compared to a highly educated woman aged between 15-19.

dat[which(dat$educ=="1Pre"&dat$agec=="40-49"&dat$marrg=="2Cohab"),]
##    educ  agec  marrg    cntrcpt fitted.prob.lrm fitted.prob.pro
## 18 1Pre 40-49 2Cohab   1Cntrcpt           0.894           0.910
## 66 1Pre 40-49 2Cohab 2NoCntrcpt           0.859           0.872
dat[which(dat$educ=="4High"&dat$agec=="30-39"&dat$marrg=="1Marrd"),]
##     educ  agec  marrg    cntrcpt fitted.prob.lrm fitted.prob.pro
## 13 4High 30-39 1Marrd   1Cntrcpt           0.335           0.335
## 61 4High 30-39 1Marrd 2NoCntrcpt           0.265           0.264

Here, we see that highly educated Ghanaian women in union and aged between 30-39 have only a 33.5 percent probability of reporting an unintentional pregnancy; contrast to pre-school educated Ghanaian women, unmarried but cohabiting with male, and aged between 40-49 - who have an estimated probability of reporting unintentional pregancy of about 89 percent. Women with higher probabilities however are mostly using some sort of contraceptive, while those who have less likelihood in reporting unintential pregnancy are not currently using a contraceptive, which could potentially put them at risk.