This is a markdown file to reproduce analysis and results for a published paper: Reporting sterilization as a current contraceptive method among sterilized women: lessons learned from a population with high sterilization rates, Rajasthan, India (Contraception. 2019 Feb;99(2):131-136). The abstract, link to the complete paper, and full publication information is available here.
There are three sections in this document for: accessing public data; preparing the analysis data file (i.e., data processing/manipulation); and conducting analysis. Analysis section contains code that is only relevant to reproduce results presented in the paper. The paper was prepared using Stata and equivalent Stata do file is available at GitHub.
"R code is shown in a gray box"
Output is in white box. (Not all reults are shown in the output document. To see results, simply remove the argument "results=FALSE" from a code chunk)
Data came from the Performance Monitoring and Accountability 2020 (PMA2020) survey conducted in Rajasthan, India, in early 2017. PMA2020 survey data are available for the public, and can be accessed here: https://www.pma2020.org/request-access-to-datasets. Request and download “Round 2 (2017) Rajasthan Household/Female survey” data. The file has been saved as “INR2_Rajasthan_HHQFQ.dta” in Stata format in a directory.
Load the downloaded public data file into “dtapublic” (If you run this markdown file on your computer, you will need to change the file path according to your directory setting).
suppressPackageStartupMessages(library (haven))
dtapublic<-data.frame(read_dta("C:/YJ/Data PMA/rawHHQFQ/INR2_Rajasthan_HHQFQ.dta"))
The unit of analysis is a female respondent in the survey. Since the public data file include both household and female level observations, keep only female level observations. Then, keep only completed interviews with de facto sample. The analysis dataset, “dta”, now has 6015 observations, all de-facto women 15-49 years of age who live in the sampled households.
suppressPackageStartupMessages(library (dplyr))
#keep only female observations (as opposed to household)
dta<-filter(dtapublic, FQmetainstanceID!="")
#keep only completed interviews
dta<-filter(dta, HHQ_result==1 & FRS_result==1)
#keep only defacto observations
dta<-filter(dta, usually_live==1)
#check the number of observations in the analysis dataset, "dta"
nrow(dta)
## [1] 6015
NOTE: In the above code chunk, “dta<-filter(dta, usually_live==1)
” should have been “dta<-filter(dta, last_night==1)
”: i.e., there was an error in classifying de-factor vs. de-jure population, though small (2.1% of women who completed interview, 129/6089). The correct total number of de-factor population is 6034. Revised analysis results (not presented here, but can be produced with the changed code, “dta<-filter(dta, last_night==1)
”) are very comparable with the published results. In any case, YC is responsible for this error.
First, create demographic and background characteristics variables.
suppressPackageStartupMessages(library(car))
agebreaks<-c(15,20,25,30,35,40,45,50)
agelabels<-c("15-19","20-24","25-29","30-34","35-39","40-44","45-49")
dta<-mutate(dta,
#Sampling weight
xweight=FQweight/1000000,
#Age
xage=FQ_age,
xagegroup5=cut(FQ_age, breaks=agebreaks, right = FALSE,
labels = agelabels),
xagegroup5_3=xagegroup5=="25-29",
xagegroup5_4=xagegroup5=="30-34",
xagegroup5_5=xagegroup5=="35-39",
xagegroup5_6=xagegroup5=="40-44",
xagegroup5_7=xagegroup5=="45-49",
#Marital status
xunion =(FQmarital_status==1 | FQmarital_status==2),
xmarried=(FQmarital_status==1),
#Sexual activity
xsa=( (last_time_sex==1 & last_time_sex_value<=30 & last_time_sex_value>=0)
| (last_time_sex==2 & last_time_sex_value<=4 & last_time_sex_value>=0)
| (last_time_sex==3 & last_time_sex_value<=1 & last_time_sex_value>=0) ),
xsa = ifelse(is.na(xsa), 0, xsa),
#residential area
xurban = ur==1,
#Household wealth
xwealth5= wealthquintile,
xpoor = wealthquintile==1,
xrich = wealthquintile==5,
#education
school = ifelse(school<0, NA, school),
xedu3 =cut(school, breaks=c(0,1,3,5), right=FALSE,
labels = c("None", "Primary", ">Secondary") ),
xedupri =cut(school, breaks=c(0,1,5), right=FALSE,
labels = c("<Primary", ">= Primary" ) ),
xedusec =cut(school, breaks=c(0,3,5), right=FALSE,
labels = c("<Secondary", ">=Secondary") ),
#caste
caste = ifelse(caste<0, NA, caste),
xcaste_sc = caste==1,
xcaste_st = caste==2,
xcaste_obc = caste==3,
xcaste4 =cut(caste, breaks=c(1,2,3,4,5), right=FALSE,
labels = c("SC", "ST", "OBC", "General") ),
#Religion
xhindu = religion==1,
xreligion=cut(religion, breaks=c(1,2,3,100), right = FALSE,
labels = c("Hindu", "Muslim", "Other") )
)
suppressPackageStartupMessages(library(Hmisc))
label(dta$xagegroup5)<-"5-year age group"
label(dta$xunion)<-"currently in union"
label(dta$xmarried)<-"currently married"
label(dta$xsa)<-"sexual activity in the last 30 days"
label(dta$xurban)<-"living in urban"
label(dta$xwealth5)<-"household wealth quintile"
label(dta$xpoor)<-"household wealth quintile: lowest"
label(dta$xrich)<-"household wealth quintile: highest"
label(dta$xedu3)<-"3-category education"
label(dta$xedupri)<-"attended primary school or higher"
label(dta$xedusec)<-"attended secondary school or higher"
label(dta$xcaste4)<-"4-category caste"
label(dta$xreligion)<-"religion"
Next, construct variables regarding contraception, including sterilization.
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(dplyr))
dta<-mutate(dta,
#using any, modern, and female sterilization
xuseany=current_user==1,
xusemodern=current_methodnum>=2 & current_methodnum<30,
xusest=current_methodnum==1,
#F Sterilization spontanously reported
xst_report = femalester==1,
xst_report = ifelse(is.na(xst_report), 0, xst_report),
#F sterilization reported only when probed
xst_probe = sterilization_probe ==1,
xst_probe = ifelse(is.na(xst_probe), 0, xst_probe),
#F sterilization overall
xsteril = xst_report==1 | xst_probe==1,
# prep variables to calculate times since sterilization
# xinterview_cmc "timing of interview (CMC)"
xinterview_mo = month(FQdoi_correctedSIF),
xinterview_yr = year(FQdoi_correctedSIF),
xinterview_cmc = 12*(xinterview_yr - 1900) + xinterview_mo,
# xbegin_cmc "timing of start using methods (CMC)"
xbegin_mo = month(begin_usingSIF),
xbegin_yr = year(begin_usingSIF),
xbegin_cmc = 12*(xbegin_yr - 1900) + xbegin_mo,
# time since start
xtime=xinterview_cmc - xbegin_cmc,
xtime = ifelse(xsteril==FALSE, NA, xtime),
xtime_yr = trunc((xtime/12), 0),
xtime_yr5=cut(xtime_yr,
breaks=c(0, 5, 10, 15, 20, 30),
right = FALSE,
labels = c("0-4","5-9","10-14","15-19",">=20")),
xtime_yr5_2=xtime_yr5=="5-9",
xtime_yr5_3=xtime_yr5=="10-14",
xtime_yr5_4=xtime_yr5=="15-19",
xtime_yr5_5=xtime_yr5==">=20"
)
suppressPackageStartupMessages(library(Hmisc))
label(dta$xuseany)<-"currently using any methods, reported"
label(dta$xusemodern)<-"curretnly using modern methods, reported"
label(dta$xusest)<-"currently using sterilization, reported"
label(dta$xst_report)<-"sterilized, reported"
label(dta$xst_probe)<-"sterilized, reported when only probed"
label(dta$xsteril)<-"sterilized"
label(dta$xinterview_cmc)<-"time of interview (CMC)"
label(dta$xbegin_cmc)<-"time when started the method (CMC)"
label(dta$xtime)<-"time since method adoption (month)"
label(dta$xtime_yr)<-"time since method adoption (year)"
label(dta$xtime_yr5)<-"5-year category since method adoption"
There are three tables in results. All estimates (e.g., proportion, odds ratio) were adjusted for survey sample design, since PMA2020 household surveys are sample surveys (see details regarding the sample design here and here). The number of observations in all tables is unweighted number.
Table 1 presents study population characteristics: overall and by sterilization status.
First, set the survey design
Note: There were 55 cases (last_night!=1, due to the error mentioned above!) that have missing weight. In Stata, these cases are excluded in calculation, but R needs no missing weights for “surveydesign”. Thus, those cases are dropped here (see the code line for “dta2” below) to be consistent with the STATA results. Technically, we could have replaced missing values with the cluster-specific weight values. But, that was not done for the paper.
suppressPackageStartupMessages(library(survey))
suppressPackageStartupMessages(library(dplyr))
dta2<-dta%>% filter(!is.na(dta$xweight))
dtaw <- svydesign(ids = ~ EA_ID,
data = dta2,
strata = ~ur,
weights = ~xweight)
Distribution of background characteristics among the entire study sample (second column in Table 1)
round(svymean( ~ xage, dtaw), 1)
mysvymean<-svymean( ~ xage, dtaw)
round(confint(mysvymean), 1)
round(prop.table(svytable(~ xmarried, dtaw))*100, 1)
round(prop.table(svytable(~ xsa, dtaw))*100, 1)
round(prop.table(svytable(~ xurban, dtaw))*100, 1)
round(prop.table(svytable(~ xwealth5, dtaw))*100, 1)
round(prop.table(svytable(~ xedu3, dtaw))*100, 1)
round(prop.table(svytable(~ xcaste4, dtaw))*100, 1)
round(prop.table(svytable(~ xreligion, dtaw))*100, 1)
Then, distribution of background characteristics among those NOT sterilized (third column in Table 1)
# level of strilization
table(dta$xsteril)
round(prop.table(svytable(~ xsteril, dtaw))*100, 1)
dtaneverst<-dta2%>% filter(dta2$xsteril==FALSE)
nrow(dtaneverst)
dtaw <- svydesign(ids = ~ EA_ID,
data = dtaneverst,
strata = ~ur,
weights = ~xweight)
round(svymean( ~ xage, dtaw), 1)
mysvymean<-svymean( ~ xage, dtaw)
round(confint(mysvymean), 1)
round(prop.table(svytable(~ xmarried, dtaw))*100, 1)
round(prop.table(svytable(~ xsa, dtaw))*100, 1)
round(prop.table(svytable(~ xurban, dtaw))*100, 1)
round(prop.table(svytable(~ xwealth5, dtaw))*100, 1)
round(prop.table(svytable(~ xedu3, dtaw))*100, 1)
round(prop.table(svytable(~ xcaste4, dtaw))*100, 1)
round(prop.table(svytable(~ xreligion, dtaw))*100, 1)
Finally, distribution of background characteristics among those EVER sterilized (fourth column in Table 1)
dtaeverst<-dta2%>% filter(dta2$xsteril==TRUE)
nrow(dtaeverst)
dtaw <- svydesign(ids = ~ EA_ID,
data = dtaeverst,
strata = ~ur,
weights = ~xweight)
round(svymean( ~ xage, dtaw), 1)
mysvymean<-svymean( ~ xage, dtaw)
round(confint(mysvymean), 1)
round(prop.table(svytable(~ xmarried, dtaw))*100, 1)
round(prop.table(svytable(~ xsa, dtaw))*100, 1)
round(prop.table(svytable(~ xurban, dtaw))*100, 1)
round(prop.table(svytable(~ xwealth5, dtaw))*100, 1)
round(prop.table(svytable(~ xedu3, dtaw))*100, 1)
round(prop.table(svytable(~ xcaste4, dtaw))*100, 1)
round(prop.table(svytable(~ xreligion, dtaw))*100, 1)
round(prop.table(svytable(~ xtime_yr5, dtaw))*100, 1)
Table 2 shows, among women who are sterilized, the percentage who reported sterilization as a current contraceptive method by background characteristics.
First, denominator in each sub-group
dtaeverst<-dta %>% filter(dta$xsteril==TRUE)
table(dtaeverst$xagegroup5)
table(dtaeverst$xmarried)
table(dtaeverst$xsa)
table(dtaeverst$xurban)
table(dtaeverst$xwealth5)
table(dtaeverst$xedu3 )
table(dtaeverst$xcaste4)
table(dtaeverst$xreligion)
table(dtaeverst$xtime_yr5)
Then, proportion of reporting by background characteristics
dtaeverst<-dta2 %>% filter(dta2$xsteril==TRUE)
dtaw <- svydesign(ids = ~ EA_ID,
data = dtaeverst,
strata = ~ur,
weights = ~xweight)
svyby(~xst_report, ~xmarried, dtaw, svymean)
svyby(~xst_report, ~ xsa, dtaw, svymean)
svyby(~xst_report, ~ xurban, dtaw, svymean)
svyby(~xst_report, ~ xwealth5, dtaw, svymean)
svyby(~xst_report, ~ xedu3, dtaw, svymean)
svyby(~xst_report, ~ xcaste4, dtaw, svymean)
svyby(~xst_report, ~ xreligion, dtaw, svymean)
svyby(~xst_report, ~ xtime_yr5, dtaw, svymean)
Finally, Table 3 presents differential odds of reporting sterilization as a current contraceptive method by background characteristics, among women who are sterilized. The table shows both bivariate and multivariable logistic regression analyses results.
First, bivariate regression by each covariate:
dtaw <- svydesign(ids = ~ EA_ID,
data = dtaeverst,
strata = ~ur,
weights = ~xweight)
mysvyglm<-svyglm(xst_report ~ xmarried, dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
# Odds ratio, 95% confidence interval, and p-value
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
mysvyglm<-svyglm(xst_report ~ xsa, dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
mysvyglm<-svyglm(xst_report ~ xurban, dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
mysvyglm<-svyglm(xst_report ~ xedupri, dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
mysvyglm<-svyglm(xst_report ~ xpoor+xrich, dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
mysvyglm<-svyglm(xst_report ~ xcaste_sc+xcaste_st+xcaste_obc, dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
mysvyglm<-svyglm(xst_report ~ xhindu, dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
mysvyglm<-svyglm(xst_report ~ xtime_yr5_2+xtime_yr5_3+xtime_yr5_4+xtime_yr5_5, dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
Multivariable regression:
dtaw <- svydesign(ids = ~ EA_ID,
data = dtaeverst,
strata = ~ur,
weights = ~xweight)
mysvyglm<-svyglm(xst_report ~
xagegroup5_3+xagegroup5_4+xagegroup5_5+xagegroup5_6+xagegroup5_7+
xmarried+xsa+xurban+xedupri+xpoor+xrich+
xcaste_sc+xcaste_st+xcaste_obc+xhindu+
xtime_yr5_2+xtime_yr5_3+xtime_yr5_4+xtime_yr5_5,
dtaw, family=binomial)
summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary <-round(as.data.frame(summary(mysvyglm)$coef), 3)
# Odds ratio, 95% confidence interval, and p-value
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
summaryexp[, 1] 2.5 % 97.5 % summary[, 4]
xagegroup5_3TRUE 1.00 0.53 1.91 0.995
xagegroup5_4TRUE 0.87 0.48 1.60 0.665
xagegroup5_5TRUE 0.84 0.42 1.67 0.623
xagegroup5_6TRUE 0.88 0.42 1.85 0.742
xagegroup5_7TRUE 0.86 0.37 1.99 0.718
xmarriedTRUE 2.29 1.20 4.36 0.013
xsa 1.01 0.61 1.69 0.955
xurbanTRUE 0.76 0.40 1.46 0.411
xedupri>= Primary 1.07 0.78 1.46 0.685
xpoorTRUE 0.54 0.32 0.90 0.020
xrichTRUE 1.30 0.88 1.91 0.188
xcaste_scTRUE 1.98 1.15 3.42 0.016
xcaste_stTRUE 2.24 1.06 4.77 0.038
xcaste_obcTRUE 2.28 1.33 3.90 0.003
xhinduTRUE 1.08 0.52 2.24 0.841
xtime_yr5_2TRUE 0.94 0.65 1.35 0.742
xtime_yr5_3TRUE 0.97 0.57 1.64 0.904
xtime_yr5_4TRUE 1.24 0.70 2.19 0.455
xtime_yr5_5TRUE 1.67 0.68 4.11 0.268
Note: In the published paper, it appears that distribution of educational level in Table 1 and Table 2 (denominator) were based on a categorical variable that was incorrectly created initially. However, the level of differential reporting presented in both Tables 2 and 3 are based on the correct variable, as shown in this document. It was the first author’s oversight.