Author: YJ Choi
Date: 2019-06-05

Introduction

This is a markdown file to reproduce analysis and results for a published paper: Reporting sterilization as a current contraceptive method among sterilized women: lessons learned from a population with high sterilization rates, Rajasthan, India (Contraception. 2019 Feb;99(2):131-136). The abstract, link to the complete paper, and full publication information is available here.

There are three sections in this document for: accessing public data; preparing the analysis data file (i.e., data processing/manipulation); and conducting analysis. Analysis section contains code that is only relevant to reproduce results presented in the paper. The paper was prepared using Stata and equivalent Stata do file is available at GitHub.

"R code is shown in a gray box"

Output is in white box. (Not all reults are shown in the output document. To see results, simply remove the argument "results=FALSE" from a code chunk)

1. Data Access

Data came from the Performance Monitoring and Accountability 2020 (PMA2020) survey conducted in Rajasthan, India, in early 2017. PMA2020 survey data are available for the public, and can be accessed here: https://www.pma2020.org/request-access-to-datasets. Request and download “Round 2 (2017) Rajasthan Household/Female survey” data. The file has been saved as “INR2_Rajasthan_HHQFQ.dta” in Stata format in a directory.

2. Data Management

2.1. Load data

Load the downloaded public data file into “dtapublic” (If you run this markdown file on your computer, you will need to change the file path according to your directory setting).

suppressPackageStartupMessages(library (haven)) 
dtapublic<-data.frame(read_dta("C:/YJ/Data PMA/rawHHQFQ/INR2_Rajasthan_HHQFQ.dta"))

2.2. Define analysis sample

The unit of analysis is a female respondent in the survey. Since the public data file include both household and female level observations, keep only female level observations. Then, keep only completed interviews with de facto sample. The analysis dataset, “dta”, now has 6015 observations, all de-facto women 15-49 years of age who live in the sampled households.

suppressPackageStartupMessages(library (dplyr))
#keep only female observations (as opposed to household)
dta<-filter(dtapublic, FQmetainstanceID!="")
#keep only completed interviews
dta<-filter(dta, HHQ_result==1 & FRS_result==1)
#keep only defacto observations
dta<-filter(dta, usually_live==1)
#check the number of observations in the analysis dataset, "dta"
nrow(dta)

## [1] 6015

NOTE: In the above code chunk, “dta<-filter(dta, usually_live==1)” should have been “dta<-filter(dta, last_night==1)”: i.e., there was an error in classifying de-factor vs. de-jure population, though small (2.1% of women who completed interview, 129/6089). The correct total number of de-factor population is 6034. Revised analysis results (not presented here, but can be produced with the changed code, “dta<-filter(dta, last_night==1)”) are very comparable with the published results. In any case, YC is responsible for this error.

2.3. Construct analysis variables

First, create demographic and background characteristics variables.

suppressPackageStartupMessages(library(car))

agebreaks<-c(15,20,25,30,35,40,45,50)
agelabels<-c("15-19","20-24","25-29","30-34","35-39","40-44","45-49")
dta<-mutate(dta,
    #Sampling weight
    xweight=FQweight/1000000,    
    #Age
    xage=FQ_age,
    xagegroup5=cut(FQ_age, breaks=agebreaks, right = FALSE, 
                   labels = agelabels),
    xagegroup5_3=xagegroup5=="25-29",
    xagegroup5_4=xagegroup5=="30-34",
    xagegroup5_5=xagegroup5=="35-39",
    xagegroup5_6=xagegroup5=="40-44",
    xagegroup5_7=xagegroup5=="45-49",
    #Marital status
    xunion  =(FQmarital_status==1 | FQmarital_status==2),
    xmarried=(FQmarital_status==1),
    #Sexual activity 
    xsa=( (last_time_sex==1 & last_time_sex_value<=30 & last_time_sex_value>=0) 
        | (last_time_sex==2 & last_time_sex_value<=4 & last_time_sex_value>=0)  
        | (last_time_sex==3 & last_time_sex_value<=1 & last_time_sex_value>=0) ), 
    xsa = ifelse(is.na(xsa), 0, xsa),
    #residential area
    xurban  = ur==1,
    #Household wealth  
    xwealth5= wealthquintile,
    xpoor   = wealthquintile==1,
    xrich   = wealthquintile==5,
    #education 
    school  = ifelse(school<0, NA, school),
    xedu3   =cut(school, breaks=c(0,1,3,5), right=FALSE,
                 labels = c("None", "Primary", ">Secondary") ),
    xedupri =cut(school, breaks=c(0,1,5), right=FALSE, 
                 labels = c("<Primary", ">= Primary" ) ),
    xedusec =cut(school, breaks=c(0,3,5), right=FALSE, 
                 labels = c("<Secondary", ">=Secondary") ),
    #caste
    caste   = ifelse(caste<0, NA, caste),
    xcaste_sc   = caste==1,
    xcaste_st   = caste==2,
    xcaste_obc  = caste==3,
    xcaste4 =cut(caste, breaks=c(1,2,3,4,5), right=FALSE,
                 labels = c("SC", "ST", "OBC", "General") ),
    #Religion 
    xhindu  = religion==1,
    xreligion=cut(religion, breaks=c(1,2,3,100), right = FALSE, 
                  labels = c("Hindu", "Muslim", "Other") ) 
)

suppressPackageStartupMessages(library(Hmisc))

label(dta$xagegroup5)<-"5-year age group"
label(dta$xunion)<-"currently in union"
label(dta$xmarried)<-"currently married"
label(dta$xsa)<-"sexual activity in the last 30 days"
label(dta$xurban)<-"living in urban"
label(dta$xwealth5)<-"household wealth quintile" 
label(dta$xpoor)<-"household wealth quintile: lowest" 
label(dta$xrich)<-"household wealth quintile: highest"
label(dta$xedu3)<-"3-category education" 
label(dta$xedupri)<-"attended primary school or higher"
label(dta$xedusec)<-"attended secondary school or higher"
label(dta$xcaste4)<-"4-category caste" 
label(dta$xreligion)<-"religion"

Next, construct variables regarding contraception, including sterilization.

suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(dplyr))

dta<-mutate(dta, 
    #using any, modern, and female sterilization 
    xuseany=current_user==1,
    xusemodern=current_methodnum>=2 & current_methodnum<30,
    xusest=current_methodnum==1,
    #F Sterilization spontanously reported
    xst_report  = femalester==1,    
    xst_report  = ifelse(is.na(xst_report), 0, xst_report),
    #F sterilization reported only when probed
    xst_probe   = sterilization_probe ==1,
    xst_probe   = ifelse(is.na(xst_probe), 0, xst_probe),
    #F sterilization overall 
    xsteril = xst_report==1 | xst_probe==1,
    # prep variables to calculate times since sterilization 
    # xinterview_cmc    "timing of interview (CMC)"     
    xinterview_mo = month(FQdoi_correctedSIF),
    xinterview_yr = year(FQdoi_correctedSIF),
    xinterview_cmc = 12*(xinterview_yr - 1900) + xinterview_mo,
    # xbegin_cmc    "timing of start using methods (CMC)" 
    xbegin_mo = month(begin_usingSIF),
    xbegin_yr = year(begin_usingSIF),
    xbegin_cmc = 12*(xbegin_yr - 1900) + xbegin_mo,
    # time since start 
    xtime=xinterview_cmc - xbegin_cmc, 
    xtime   = ifelse(xsteril==FALSE, NA, xtime),
    xtime_yr = trunc((xtime/12), 0),
    xtime_yr5=cut(xtime_yr, 
                  breaks=c(0, 5, 10, 15, 20, 30), 
                  right = FALSE, 
                  labels = c("0-4","5-9","10-14","15-19",">=20")), 
    xtime_yr5_2=xtime_yr5=="5-9",
    xtime_yr5_3=xtime_yr5=="10-14",
    xtime_yr5_4=xtime_yr5=="15-19",
    xtime_yr5_5=xtime_yr5==">=20"
)

suppressPackageStartupMessages(library(Hmisc))

label(dta$xuseany)<-"currently using any methods, reported" 
label(dta$xusemodern)<-"curretnly using modern methods, reported" 
label(dta$xusest)<-"currently using sterilization, reported"

label(dta$xst_report)<-"sterilized, reported"
label(dta$xst_probe)<-"sterilized, reported when only probed"
label(dta$xsteril)<-"sterilized"
label(dta$xinterview_cmc)<-"time of interview (CMC)" 
label(dta$xbegin_cmc)<-"time when started the method (CMC)"
label(dta$xtime)<-"time since method adoption (month)" 
label(dta$xtime_yr)<-"time since method adoption (year)"           
label(dta$xtime_yr5)<-"5-year category since method adoption"

3. Data Analysis

There are three tables in results. All estimates (e.g., proportion, odds ratio) were adjusted for survey sample design, since PMA2020 household surveys are sample surveys (see details regarding the sample design here and here). The number of observations in all tables is unweighted number.

3.1. Table 1

Table 1 presents study population characteristics: overall and by sterilization status.

First, set the survey design

Note: There were 55 cases (last_night!=1, due to the error mentioned above!) that have missing weight. In Stata, these cases are excluded in calculation, but R needs no missing weights for “surveydesign”. Thus, those cases are dropped here (see the code line for “dta2” below) to be consistent with the STATA results. Technically, we could have replaced missing values with the cluster-specific weight values. But, that was not done for the paper.

suppressPackageStartupMessages(library(survey))
suppressPackageStartupMessages(library(dplyr))

dta2<-dta%>% filter(!is.na(dta$xweight))
dtaw <- svydesign(ids = ~ EA_ID,
                  data = dta2,
                  strata = ~ur,   
                  weights = ~xweight)

Distribution of background characteristics among the entire study sample (second column in Table 1)

round(svymean( ~ xage, dtaw), 1)
    mysvymean<-svymean( ~ xage, dtaw)
    round(confint(mysvymean), 1)
round(prop.table(svytable(~ xmarried, dtaw))*100, 1)
round(prop.table(svytable(~ xsa, dtaw))*100, 1)
round(prop.table(svytable(~ xurban, dtaw))*100, 1)
round(prop.table(svytable(~ xwealth5, dtaw))*100, 1)
round(prop.table(svytable(~ xedu3, dtaw))*100, 1)
round(prop.table(svytable(~ xcaste4, dtaw))*100, 1)
round(prop.table(svytable(~ xreligion, dtaw))*100, 1)

Then, distribution of background characteristics among those NOT sterilized (third column in Table 1)

# level of strilization 
table(dta$xsteril)
round(prop.table(svytable(~ xsteril, dtaw))*100, 1)

dtaneverst<-dta2%>% filter(dta2$xsteril==FALSE) 
nrow(dtaneverst)
dtaw <- svydesign(ids = ~ EA_ID,
                  data = dtaneverst,
                  strata = ~ur,   
                  weights = ~xweight)

round(svymean( ~ xage, dtaw), 1)
    mysvymean<-svymean( ~ xage, dtaw)
    round(confint(mysvymean), 1)
round(prop.table(svytable(~ xmarried, dtaw))*100, 1)
round(prop.table(svytable(~ xsa, dtaw))*100, 1)
round(prop.table(svytable(~ xurban, dtaw))*100, 1)
round(prop.table(svytable(~ xwealth5, dtaw))*100, 1)
round(prop.table(svytable(~ xedu3, dtaw))*100, 1)
round(prop.table(svytable(~ xcaste4, dtaw))*100, 1)
round(prop.table(svytable(~ xreligion, dtaw))*100, 1)

Finally, distribution of background characteristics among those EVER sterilized (fourth column in Table 1)

dtaeverst<-dta2%>% filter(dta2$xsteril==TRUE) 
nrow(dtaeverst)
dtaw <- svydesign(ids = ~ EA_ID,
                  data = dtaeverst,
                  strata = ~ur,   
                  weights = ~xweight)

round(svymean( ~ xage, dtaw), 1)
    mysvymean<-svymean( ~ xage, dtaw)
    round(confint(mysvymean), 1)
round(prop.table(svytable(~ xmarried, dtaw))*100, 1)
round(prop.table(svytable(~ xsa, dtaw))*100, 1)
round(prop.table(svytable(~ xurban, dtaw))*100, 1)
round(prop.table(svytable(~ xwealth5, dtaw))*100, 1)
round(prop.table(svytable(~ xedu3, dtaw))*100, 1)
round(prop.table(svytable(~ xcaste4, dtaw))*100, 1)
round(prop.table(svytable(~ xreligion, dtaw))*100, 1)
round(prop.table(svytable(~ xtime_yr5, dtaw))*100, 1)

3.2. Table 2

Table 2 shows, among women who are sterilized, the percentage who reported sterilization as a current contraceptive method by background characteristics.

First, denominator in each sub-group

dtaeverst<-dta %>% filter(dta$xsteril==TRUE) 

table(dtaeverst$xagegroup5)
table(dtaeverst$xmarried)
table(dtaeverst$xsa)
table(dtaeverst$xurban)
table(dtaeverst$xwealth5)
table(dtaeverst$xedu3 )         
table(dtaeverst$xcaste4)
table(dtaeverst$xreligion)
table(dtaeverst$xtime_yr5)

Then, proportion of reporting by background characteristics

dtaeverst<-dta2 %>% filter(dta2$xsteril==TRUE) 
dtaw <- svydesign(ids = ~ EA_ID,
                  data = dtaeverst,
                  strata = ~ur,   
                  weights = ~xweight)

svyby(~xst_report, ~xmarried, dtaw, svymean)
svyby(~xst_report, ~ xsa, dtaw, svymean)
svyby(~xst_report, ~ xurban, dtaw, svymean)
svyby(~xst_report, ~ xwealth5, dtaw, svymean)
svyby(~xst_report, ~ xedu3, dtaw, svymean)
svyby(~xst_report, ~ xcaste4, dtaw, svymean)
svyby(~xst_report, ~ xreligion, dtaw, svymean)
svyby(~xst_report, ~ xtime_yr5, dtaw, svymean)

3.3. Table 3

Finally, Table 3 presents differential odds of reporting sterilization as a current contraceptive method by background characteristics, among women who are sterilized. The table shows both bivariate and multivariable logistic regression analyses results.

First, bivariate regression by each covariate:

dtaw <- svydesign(ids = ~ EA_ID,
                  data = dtaeverst,
                  strata = ~ur,   
                  weights = ~xweight)

mysvyglm<-svyglm(xst_report ~ xmarried, dtaw, family=binomial)
    summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
    confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
    summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)
    # Odds ratio, 95% confidence interval, and p-value
    cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]

mysvyglm<-svyglm(xst_report ~ xsa, dtaw, family=binomial)
    summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
    confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
    summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)
    cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]

mysvyglm<-svyglm(xst_report ~ xurban, dtaw, family=binomial)
    summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
    confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
    summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)
    cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]

mysvyglm<-svyglm(xst_report ~ xedupri, dtaw, family=binomial)
    summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
    confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
    summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)
    cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]

mysvyglm<-svyglm(xst_report ~ xpoor+xrich, dtaw, family=binomial)
    summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
    confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
    summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)
    cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]

mysvyglm<-svyglm(xst_report ~ xcaste_sc+xcaste_st+xcaste_obc, dtaw, family=binomial)
    summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
    confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
    summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)
    cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
    
mysvyglm<-svyglm(xst_report ~ xhindu, dtaw, family=binomial)
    summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
    confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
    summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)
    cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]
    
mysvyglm<-svyglm(xst_report ~ xtime_yr5_2+xtime_yr5_3+xtime_yr5_4+xtime_yr5_5, dtaw, family=binomial)
    summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
    confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
    summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)
    cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]

Multivariable regression:

dtaw <- svydesign(ids = ~ EA_ID,
                  data = dtaeverst,
                  strata = ~ur,   
                  weights = ~xweight)

mysvyglm<-svyglm(xst_report ~                      
                 xagegroup5_3+xagegroup5_4+xagegroup5_5+xagegroup5_6+xagegroup5_7+
                 xmarried+xsa+xurban+xedupri+xpoor+xrich+
                 xcaste_sc+xcaste_st+xcaste_obc+xhindu+
                 xtime_yr5_2+xtime_yr5_3+xtime_yr5_4+xtime_yr5_5,
                 dtaw, family=binomial)

summaryexp<-round(as.data.frame(exp(summary(mysvyglm)$coef)), 2)
confint   <-round(as.data.frame(exp(confint(mysvyglm))), 2)
summary   <-round(as.data.frame(summary(mysvyglm)$coef), 3)

# Odds ratio, 95% confidence interval, and p-value
cbind(summaryexp[,1], confint[,1:2], summary[,4])[-1,]

                  summaryexp[, 1] 2.5 % 97.5 % summary[, 4]
xagegroup5_3TRUE             1.00  0.53   1.91        0.995
xagegroup5_4TRUE             0.87  0.48   1.60        0.665
xagegroup5_5TRUE             0.84  0.42   1.67        0.623
xagegroup5_6TRUE             0.88  0.42   1.85        0.742
xagegroup5_7TRUE             0.86  0.37   1.99        0.718
xmarriedTRUE                 2.29  1.20   4.36        0.013
xsa                          1.01  0.61   1.69        0.955
xurbanTRUE                   0.76  0.40   1.46        0.411
xedupri>= Primary            1.07  0.78   1.46        0.685
xpoorTRUE                    0.54  0.32   0.90        0.020
xrichTRUE                    1.30  0.88   1.91        0.188
xcaste_scTRUE                1.98  1.15   3.42        0.016
xcaste_stTRUE                2.24  1.06   4.77        0.038
xcaste_obcTRUE               2.28  1.33   3.90        0.003
xhinduTRUE                   1.08  0.52   2.24        0.841
xtime_yr5_2TRUE              0.94  0.65   1.35        0.742
xtime_yr5_3TRUE              0.97  0.57   1.64        0.904
xtime_yr5_4TRUE              1.24  0.70   2.19        0.455
xtime_yr5_5TRUE              1.67  0.68   4.11        0.268

Note: In the published paper, it appears that distribution of educational level in Table 1 and Table 2 (denominator) were based on a categorical variable that was incorrectly created initially. However, the level of differential reporting presented in both Tables 2 and 3 are based on the correct variable, as shown in this document. It was the first author’s oversight.