The purpose of this assignment is to develop a model that estimates the probability of a Ghanaian woman reporting an unintentional pregnancy - dependent on her educational level, marital status and age. I also seek to examine whether this probability increases or decreases if the subject is currently using some sort of contraceptive. Data was collected from the MICS 4 Questionnaire for Individual Women in 2011, by UNICEF and is specific to Ghanaian women aged between 15–49.
Before running the models, all required packages are installed:
library(car)
## Loading required package: carData
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
library(survey)
## Loading required package: grid
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
library(questionr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
All variables are recoded accordingly, starting with the DV (pregnancy intentions). Weights (wmweight) and PSU (HH1) are included prior to telling R the survey design:
library(haven)
wm <- read_sav("Desktop/Data/Ghana MICS 2011 SPSS Datasets/wm.sav")
#Pregnancy Intentions: "When you got pregnant, did you want to get pregnant at that time?"
wm$pregintnt<-Recode(wm$DB1,recodes="1=0;2=1;9=NA")
#Age: "How old are you?"
wm$agec<-Recode(wm$WB2CA,recodes="15:19='15-19';20:29='20-29';30:39='30-39';40:49='40-49';else=NA",as.factor=T)
wm$agec<-relevel(wm$agec,ref='40-49')
#Educational Attainment:"What is the highest level of school you attended?"
wm$educ<-Recode(wm$WB4,recodes="0='1Pre';1='2Prim';2='3Sec';3='4High';4:6=NA", as.factor=T)
wm$educ<-relevel(wm$educ,ref='4High')
#Contraceptive Use:"Are you currently doing something or using any method to delay or avoid getting pregnant?"
wm$cntrcpt<-Recode(wm$CP2,recodes="1='1Cntrcpt';2='2NoCntrcpt';8:9=NA",as.factor=T)
#Marriage:"Are you currently married or living together with a man as if married?"
wm$marrg<-Recode(wm$MA1,recodes="1='1Marrd';2='2Cohab';3='3Sngl';9=NA",as.factor=T)
sub<-wm%>%
select(pregintnt,agec,educ,cntrcpt,marrg,wmweight,HH1) %>%
filter( complete.cases(.))
options(survey.lonely.psu = "adjust")
des<-svydesign(ids=~HH1,weights=~wmweight, data=sub)
The percentage of unintended preganancies by age is plotted and a survey-corrected chi-square test for independence is performed. Results indicate a signficant relationship between unintentional pregnancy and age, F (2.92, 1802.2) = 7.08), p < .05
library(ggplot2)
cat<-svyby(formula = ~pregintnt, by = ~agec, design = des, FUN = svymean, na.rm=T)
svychisq(~pregintnt+agec, design = des)
##
## Pearson's X^2: Rao & Scott adjustment
##
## data: svychisq(~pregintnt + agec, design = des)
## F = 7.0848, ndf = 2.9208, ddf = 1802.2000, p-value = 0.0001169
qplot(x=cat$agec,y=cat$pregintnt, data=cat ,xlab="Age", ylab="% Pregnancy Intentions" )+
geom_errorbar(aes(x=agec, ymin=pregintnt-1.96*se,ymax= pregintnt+1.96*se), width=.25)+
ggtitle(label = "% of Ghanaian Women with Unintended Pregnancies by Age")
The percentage of unintended preganancies by educational attainment is plotted and another survey-corrected chi-square test for independence performed. Results indicate that there is a significant relationship between unintended pregnancies and educational attainment, F (2.55, 1575) = 3.0584), p < .05
library(ggplot2)
cat<-svyby(formula = ~pregintnt, by = ~educ, design = des, FUN = svymean, na.rm=T)
svychisq(~pregintnt+educ, design = des)
##
## Pearson's X^2: Rao & Scott adjustment
##
## data: svychisq(~pregintnt + educ, design = des)
## F = 3.0584, ndf = 2.5528, ddf = 1575.0000, p-value = 0.03489
qplot(x=cat$educ,y=cat$pregintnt, data=cat ,xlab="Education", ylab="% Pregnancy Intentions" )+
geom_errorbar(aes(x=educ, ymin=pregintnt-1.96*se,ymax= pregintnt+1.96*se), width=.25)+
ggtitle(label = "% of Ghanaian Women with Unintended Pregnancies by Education")
Another cross tabulation is performed with both covariates:
catdog<-svyby(formula = ~pregintnt, by = ~agec+educ, design = des, FUN = svymean, na.rm=T)
catdog
## agec educ pregintnt se
## 40-49.4High 40-49 4High 0.0000000 0.00000000
## 15-19.4High 15-19 4High 1.0000000 0.00000000
## 20-29.4High 20-29 4High 0.4296271 0.06602270
## 30-39.4High 30-39 4High 0.3164448 0.07577250
## 15-19.1Pre 15-19 1Pre 1.0000000 0.00000000
## 20-29.1Pre 20-29 1Pre 0.6519622 0.28341094
## 30-39.1Pre 30-39 1Pre 0.6443783 0.32433651
## 40-49.2Prim 40-49 2Prim 0.6148152 0.09270836
## 15-19.2Prim 15-19 2Prim 0.7408586 0.08687212
## 20-29.2Prim 20-29 2Prim 0.5062683 0.04490183
## 30-39.2Prim 30-39 2Prim 0.4371690 0.04363962
## 40-49.3Sec 40-49 3Sec 0.5159563 0.10379197
## 15-19.3Sec 15-19 3Sec 0.7577802 0.06695439
## 20-29.3Sec 20-29 3Sec 0.5394866 0.03658701
## 30-39.3Sec 30-39 3Sec 0.4763527 0.04716227
Here, I run Logit & Probit Models, both of which include all covariates in the intial survey design. Odds Ratios and Confidence intervals are performed as well:
fit.logit<-svyglm(pregintnt~agec+educ+cntrcpt+marrg,
design= des,
family=binomial)
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
exp(coef(fit.logit))#odds ratios
## (Intercept) agec15-19 agec20-29 agec30-39
## 0.7557847 1.6217066 0.7188254 0.6664974
## educ1Pre educ2Prim educ3Sec cntrcpt2NoCntrcpt
## 4.7259128 1.3710198 1.5127027 0.7160433
## marrg2Cohab marrg3Sngl
## 2.3729842 2.7594222
exp(confint(fit.logit)) #confidence intervals for odds ratios
## 2.5 % 97.5 %
## (Intercept) 0.3604891 1.5845432
## agec15-19 0.7077647 3.7158284
## agec20-29 0.3781836 1.3662939
## agec30-39 0.3530257 1.2583188
## educ1Pre 0.9304392 24.0039886
## educ2Prim 0.8193033 2.2942605
## educ3Sec 0.9293711 2.4621698
## cntrcpt2NoCntrcpt 0.5165317 0.9926168
## marrg2Cohab 1.6979681 3.3163485
## marrg3Sngl 1.7064306 4.4621860
#probit model
fit.probit<-svyglm(pregintnt~agec+educ+cntrcpt+marrg,
design=des,
family=binomial(link= "probit"))
## Warning in eval(family$initialize): non-integer #successes in a binomial
## glm!
exp(confint(fit.probit)) #confidence intervals for odds ratios
## 2.5 % 97.5 %
## (Intercept) 0.5291179 1.3232040
## agec15-19 0.8127579 2.2292783
## agec20-29 0.5496716 1.2194245
## agec30-39 0.5265041 1.1583438
## educ1Pre 1.0459086 6.7854585
## educ2Prim 0.8853588 1.6638537
## educ3Sec 0.9562822 1.7383989
## cntrcpt2NoCntrcpt 0.6662039 0.9956586
## marrg2Cohab 1.3921459 2.1050337
## marrg3Sngl 1.3965949 2.5231537
myexp<-function(x){exp(x)}
stargazer(fit.logit,
fit.probit,
type = "text",
style="demography",
covariate.labels=c("Age 15-19" ,"Age 20-29","Age 30-39","Preschool", "Primary School", "Secondary Education","Contraceptive Use", "Living with Male, Unmarried", "Single"),
ci=T, column.sep.width = "10pt",apply.coef = myexp)
##
## -----------------------------------------------------------
## pregintnt
## survey-weighted survey-weighted
## logistic probit
## Model 1 Model 2
## -----------------------------------------------------------
## Age 15-19 1.622*** 1.346***
## (0.793, 2.451) (0.842, 1.851)
## Age 20-29 0.719* 0.819***
## (0.077, 1.361) (0.420, 1.217)
## Age 30-39 0.666* 0.781***
## (0.031, 1.302) (0.387, 1.175)
## Preschool 4.726*** 2.664***
## (3.101, 6.351) (1.729, 3.599)
## Primary School 1.371*** 1.214***
## (0.856, 1.886) (0.898, 1.529)
## Secondary Education 1.513*** 1.289***
## (1.026, 2.000) (0.991, 1.588)
## Contraceptive Use 0.716*** 0.814***
## (0.389, 1.043) (0.614, 1.015)
## Living with Male, Unmarried 2.373*** 1.712***
## (2.038, 2.708) (1.505, 1.919)
## Single 2.759*** 1.877***
## (2.279, 3.240) (1.581, 2.173)
## Constant 0.756* 0.837***
## (0.015, 1.496) (0.378, 1.295)
## N 1,291 1,291
## Log Likelihood -786.262 -786.233
## AIC 1,592.525 1,592.466
## -----------------------------------------------------------
## *p < .05; **p < .01; ***p < .001
The models mutually suggest that women aged 15-19 had an increased chance of reporting an unintended pregnancy compared to their middle aged counterparts (reference group = Age 40-49). Similarly, Ghanaian women with just a primary school education had an increased chance in reporting an unintended pregnancy as well. Interestingly, the probit model has stronger significance levels than the logit model across age cagtegories and the constant. Next, I retrieve marginal effects (and plot them) to determine the rate of change between the dependent and independent variables.
#Logit marginal effects
log.marg<-coef(fit.logit)*mean(dlogis(predict(fit.logit)), na.rm=T)
#Progit marginal effects
prob.marg<-coef(fit.probit)*mean(dnorm(predict(fit.probit)), na.rm=T)
effects<-data.frame(effect=c(log.marg, prob.marg),
term=rep(names(log.marg),2),
model=c(rep("logit", length(log.marg)),rep("probit", length(log.marg)))
)
effects%>%
ggplot(aes(x=term, y=effect))+geom_point(aes(color=model, group=model, shape=model),
position=position_jitterdodge(jitter.width = 1),
size=2)+
ylab("Marginal Effect")+
xlab("Model Term")+
geom_abline(intercept = 0, slope=0)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
ggtitle(label = "Comparison of marginal effects in Logit and Probit models")
We see that marginal effects are comparable between the logit and probit model, though large distinctions exist with the first two age categories of 15-19 and 20-29.
#get a series of predicted probabilites for different "types" of people for each model
#expand.grid will generate all possible combinations of values you specify
dat<-expand.grid(educ=levels(wm$educ), agec=levels(wm$agec),marrg=levels(wm$marrg),cntrcpt=levels(wm$cntrcpt))
#You MAY need to get rid of impossible cases here
#generate the fitted values
fit<-predict(fit.logit, newdat=dat,type="response")
fitp<-predict(fit.probit, newdat=dat,type="response")
#add the values to the fake data
dat$fitted.prob.lrm<-round(fit, 3)
dat$fitted.prob.pro<-round(fitp, 3)
#Print the fitted probabilities for the first 20 cases
knitr::kable(head(dat, n=20))
| educ | agec | marrg | cntrcpt | fitted.prob.lrm | fitted.prob.pro |
|---|---|---|---|---|---|
| 4High | 40-49 | 1Marrd | 1Cntrcpt | 0.430 | 0.429 |
| 1Pre | 40-49 | 1Marrd | 1Cntrcpt | 0.781 | 0.789 |
| 2Prim | 40-49 | 1Marrd | 1Cntrcpt | 0.509 | 0.506 |
| 3Sec | 40-49 | 1Marrd | 1Cntrcpt | 0.533 | 0.530 |
| 4High | 15-19 | 1Marrd | 1Cntrcpt | 0.551 | 0.547 |
| 1Pre | 15-19 | 1Marrd | 1Cntrcpt | 0.853 | 0.864 |
| 2Prim | 15-19 | 1Marrd | 1Cntrcpt | 0.627 | 0.623 |
| 3Sec | 15-19 | 1Marrd | 1Cntrcpt | 0.650 | 0.645 |
| 4High | 20-29 | 1Marrd | 1Cntrcpt | 0.352 | 0.353 |
| 1Pre | 20-29 | 1Marrd | 1Cntrcpt | 0.720 | 0.726 |
| 2Prim | 20-29 | 1Marrd | 1Cntrcpt | 0.427 | 0.427 |
| 3Sec | 20-29 | 1Marrd | 1Cntrcpt | 0.451 | 0.451 |
| 4High | 30-39 | 1Marrd | 1Cntrcpt | 0.335 | 0.335 |
| 1Pre | 30-39 | 1Marrd | 1Cntrcpt | 0.704 | 0.710 |
| 2Prim | 30-39 | 1Marrd | 1Cntrcpt | 0.409 | 0.408 |
| 3Sec | 30-39 | 1Marrd | 1Cntrcpt | 0.432 | 0.432 |
| 4High | 40-49 | 2Cohab | 1Cntrcpt | 0.642 | 0.640 |
| 1Pre | 40-49 | 2Cohab | 1Cntrcpt | 0.894 | 0.910 |
| 2Prim | 40-49 | 2Cohab | 1Cntrcpt | 0.711 | 0.710 |
| 3Sec | 40-49 | 2Cohab | 1Cntrcpt | 0.731 | 0.730 |
There are several alarming findings in this analysis. Specifically, I specify the probability of reporting an unintended pregnancy for pre-school educated, cohabitiating woman aged between 40-49 years, compared to a highly educated woman aged between 15-19.
dat[which(dat$educ=="1Pre"&dat$agec=="40-49"&dat$marrg=="2Cohab"),]
## educ agec marrg cntrcpt fitted.prob.lrm fitted.prob.pro
## 18 1Pre 40-49 2Cohab 1Cntrcpt 0.894 0.910
## 66 1Pre 40-49 2Cohab 2NoCntrcpt 0.859 0.872
dat[which(dat$educ=="4High"&dat$agec=="30-39"&dat$marrg=="1Marrd"),]
## educ agec marrg cntrcpt fitted.prob.lrm fitted.prob.pro
## 13 4High 30-39 1Marrd 1Cntrcpt 0.335 0.335
## 61 4High 30-39 1Marrd 2NoCntrcpt 0.265 0.264
Here, we see that highly educated Ghanaian women in union and aged between 30-39 have only a 33.5 percent probability of reporting an unintentional pregnancy; contrast to pre-school educated Ghanaian women, unmarried but cohabiting with male, and aged between 40-49 - who have an estimated probability of reporting unintentional pregancy of about 89 percent. Women with higher probabilities however are mostly using some sort of contraceptive, while those who have less likelihood in reporting unintential pregnancy are not currently using a contraceptive, which could potentially put them at risk.