PMA panel data analysis practice 1

Revised Matt Gunther’s code for a webinar on COVID-19 impact on contraceptive use/dynamics. Source code and other resources - https://pma.ipums.org/pma/tutorials.shtml)

(Updated on 2022-10-19)

# Load data into R with `ipumsr`
dat <- read_ipums_micro(
  ddi = "data/pma_00005.xml",
  data = "data/pma_00005.dat.gz"
)

1 Analytic Sample

# As of 10/18/2022, data from four countries are available. But, DRC data creates an error - something related with Strata
temp <- dat %>% filter(COUNTRY==2)
summary(temp$STRATA_1)
# As of 10/18/2022, data from four countries are available. But, DRC data creates an error - something related with Strata. will need to assign artificial value since there was no strata in Kinshasa or Kongo Central. 
# Drop DRC for now. 
dat <- dat %>% filter(COUNTRY!=2) 

PMA uses an open panel design - women may enter the panel after Phase 1, and they may be lost to follow-up after any phase.

See RESULTFQ

Women who enter the panel at Phase 2 are NA for all variables at Phase 1.

dat %>% count(RESULTFQ_1) %>% kable()
RESULTFQ_1 n
1 18941
2 334
3 20
4 99
5 41
10 127
95 1
99 1
NA 5543

Women whose households were not found again after Phase 1 are NA for all variables at Phase 2.

dat %>% count(RESULTFQ_2) %>% kable()
RESULTFQ_2 n
1 19735
2 239
3 40
4 193
5 24
7 44
10 212
95 9
96 214
99 1490
NA 2907

We will only include women who were available and completed the Female Questionnaire for both Phase 1 and Phase 2.

dat <- dat %>% filter(RESULTFQ_1 == 1 & RESULTFQ_2 == 1)

dat %>% count(RESULTFQ_1, RESULTFQ_2) %>% kable()
RESULTFQ_1 RESULTFQ_2 n
1 1 14632

Additionally, PMA samples are only valid for the de facto population: women who slept in the household the night before the Household interview.

See RESIDENT

dat %>% count(RESIDENT_1) %>% kable()
RESIDENT_1 n
11 158
21 215
22 14259

We’ll also drop cases where the woman was not part of the de facto population in either Phase 1 or Phase 2.

dat <- dat %>% filter(RESIDENT_1 %in% c(11, 22) & RESIDENT_2 %in% c(11, 22))

How many cases remain?

dat %>% count(COUNTRY) %>% kable()
COUNTRY n
1 5208
7 6935
9 2087

2 Recoding Independent variables

PMA surveys contain many categorical variables. These are usually represented as factors in R.

In an IPUMS data extract, you won’t see factors!

Instead, we generate labelled numeric variables (note the label in brackets).

dat %>% ipums_var_label(CVINCOMELOSS_2)

[1] “Income loss resulted from COVID-19 restrictions”

dat %>% count(CVINCOMELOSS_2) %>% kable()
CVINCOMELOSS_2 n
0 781
1 9221
97 8
99 4220

The ipumsr package contains tools for working with labelled IPUMS data.

Usually, we handle codes like 99 [NIU (not in universe)] before transforming other missing data to NA.

dat %>% count(CVINCOMELOSS_2, HHINCOMELOSSAMT_2) %>% kable()
CVINCOMELOSS_2 HHINCOMELOSSAMT_2 n
0 2 654
0 3 127
1 2 6624
1 3 2597
97 2 8
99 1 4204
99 98 16

Tip:

Information the code `NIU (not in universe)` can always be found on a variable's [universe tab](https://pma.ipums.org/pma-action/variables/CVINCOMELOSS#universe_section).       
For [CVINCOMELOSS_2](https://pma.ipums.org/pma-action/variables/CVINCOMELOSS), `99 [NIU (not in universe)]` may indicate that the household experienced *no income loss in the last year*, or it may indicate that [HHINCOMELOSSAMT_2](https://pma.ipums.org/pma-action/variables/HHINCOMELOSSAMT) is `98 [No response or missing]`. 

We should treat the `NIU` women from households without *any* income loss as "No" in [CVINCOMELOSS_2](https://pma.ipums.org/pma-action/variables/CVINCOMELOSS). 
dat <- dat %>% 
  mutate(
    CVINCOMELOSS_2 = CVINCOMELOSS_2 %>% 
      labelled::recode_if(HHINCOMELOSSAMT_2 == 1, 0)
  )

dat %>% count(CVINCOMELOSS_2, HHINCOMELOSSAMT_2) %>% kable()
CVINCOMELOSS_2 HHINCOMELOSSAMT_2 n
0 1 4204
0 2 654
0 3 127
1 2 6624
1 3 2597
97 2 8
99 98 16

Next, we’ll use NA to represent the remaining values above 90:

  • 97 [Don't know] and
  • remaining cases marked 99 [NIU (not in universe)]
dat <- dat %>% 
  mutate(
    CVINCOMELOSS_2 = CVINCOMELOSS_2 %>% 
      lbl_na_if(~.val > 90)
  )  

dat %>% count(CVINCOMELOSS_2, HHINCOMELOSSAMT_2) %>% kable()
CVINCOMELOSS_2 HHINCOMELOSSAMT_2 n
0 1 4204
0 2 654
0 3 127
1 2 6624
1 3 2597
NA 2 8
NA 98 16

Once you’re done with labels, we recommend transforming key variables into factors with forcats::as_factor.

The forcats package is included when you load library(tidyverse).

dat <- dat %>% mutate(CVINCOMELOSS_2 = as_factor(CVINCOMELOSS_2))

dat %>% count(CVINCOMELOSS_2) %>% kable()
CVINCOMELOSS_2 n
No 4985
Yes 9221
NA 24

This will make categorical variables easier to use in data visualization and as “dummy” variables in regression analysis.

Likert-style questions can be treated as factors, too.

dat %>% ipums_var_label(COVIDCONCERN_2)

[1] “Concerned about getting infected”

dat %>% count(COVIDCONCERN_2) %>% kable()
COVIDCONCERN_2 n
1 507
2 842
3 2865
4 10002
5 11
98 3

This time we’ll treat codes 5 and above as NA.

dat <- dat %>% 
  mutate(
    COVIDCONCERN_2 = COVIDCONCERN_2 %>% 
      lbl_na_if(~.val >= 5) %>% 
      as_factor()
  )

dat %>% count(COVIDCONCERN_2) %>% kable()
COVIDCONCERN_2 n
Not concerned 507
A little concerned 842
Concerned 2865
Very concerned 10002
NA 14

You can apply the same transformation to several variables with help from dplyr::across.

dplyr is another package included when you load library(tidyverse).

dat <- dat %>% 
  mutate(
    across(
        c(COUNTRY, URBAN, WEALTHT_2, EDUCATTGEN_2),
        ~.x %>% lbl_na_if(~.val >= 90) %>% as_factor()
    )
  )

Often, it’s important to set a reference group against which all dummy variables will be compared.

You can manually specify a refernece group when you set factor “levels” with a function like forcats::fct_relevel.

dat <- dat %>% 
  mutate(
    AGE_2 = case_when(
      AGE_2 < 25 ~ "15-24",
      AGE_2 < 35 ~ "25-34",
      AGE_2 < 50 ~ "35-49"
    ),
    AGE_2 = AGE_2 %>% fct_relevel("15-24", "25-34", "35-49")
  ) 

3 Dependent variables

We’ll use our recoded variables to model the likelihood of contraceptive method adoption and discontinuation between phases.

See CP

dat <- dat %>% filter(CP_1 < 90 & CP_2 < 90)

dat %>% count(CP_1, CP_2) %>% kable()
CP_1 CP_2 n
0 0 6425
0 1 2153
1 0 1328
1 1 4321

A woman has adopted a method if she was not using one at Phase 1, but then reported using one at Phase 2.

She has discontinued a method if she did use one at Phase 1, but no longer uses one at Phase 2.

dat <- dat %>% 
  mutate(
    FPSTATUS = case_when(
      CP_1 == 1 & CP_2 == 1 ~ "User",
      CP_1 == 0 & CP_2 == 0 ~ "Non-user",
      CP_1 == 1 & CP_2 == 0 ~ "Discontinued",
      CP_1 == 0 & CP_2 == 1 ~ "Adopted"
    ),
    FPSTATUS = fct_infreq(FPSTATUS)
  )

Un-weighted sample proportions for FPSTATUS can be found with count and prop.table:

dat_nowt <- dat %>% 
  group_by(COUNTRY) %>% 
  count(FPSTATUS) %>% 
  mutate(prop = prop.table(n))

dat_nowt %>% kable()
COUNTRY FPSTATUS n prop
Burkina Faso Non-user 2589 0.4972153
Burkina Faso User 1241 0.2383330
Burkina Faso Adopted 821 0.1576724
Burkina Faso Discontinued 556 0.1067793
Kenya Non-user 2518 0.3631382
Kenya User 2676 0.3859244
Kenya Adopted 1118 0.1612345
Kenya Discontinued 622 0.0897029
Nigeria Non-user 1318 0.6318313
Nigeria User 404 0.1936721
Nigeria Adopted 214 0.1025887
Nigeria Discontinued 150 0.0719080

We’ll plot this table with ggplot2 (also included with the tidyverse).

dat_nowt %>% 
  ggplot(aes(x = prop, y = FPSTATUS, fill = FPSTATUS)) +  
  geom_bar(stat = "identity") +
  facet_wrap(~COUNTRY) + theme_minimal() + 
  theme(axis.title = element_blank(), legend.position = "none") + 
  scale_x_continuous(labels = scales::label_percent())

For weighted population estimates, use as_survey_design and survey_mean from the srvyr package.

Use prop = TRUE to adjust standard errors near 0% or 100% for proportions.

dat_wtd <- dat %>% 
  as_survey_design(weight = PANELWEIGHT, id = EAID_1, strata = STRATA_1) %>%
  group_by(COUNTRY, FPSTATUS) %>% 
  summarise(survey_mean(prop = TRUE, prop_method = "logit", vartype = "ci"))

dat_wtd %>% kable()
COUNTRY FPSTATUS coef _low _upp
Burkina Faso Non-user 0.5626078 0.5286594 0.5959798
Burkina Faso User 0.1877959 0.1639061 0.2142754
Burkina Faso Adopted 0.1496680 0.1329277 0.1681077
Burkina Faso Discontinued 0.0999283 0.0870280 0.1145009
Kenya Non-user 0.3782787 0.3608162 0.3960627
Kenya User 0.3657702 0.3495995 0.3822493
Kenya Adopted 0.1647120 0.1530223 0.1771079
Kenya Discontinued 0.0912391 0.0829785 0.1002321
Nigeria Non-user 0.6513030 0.5952480 0.7034622
Nigeria User 0.1788674 0.1496337 0.2123860
Nigeria Adopted 0.1013731 0.0821519 0.1244817
Nigeria Discontinued 0.0684565 0.0545171 0.0856373
dat_wtd %>% 
  ggplot(aes(x = coef, y = FPSTATUS, fill = FPSTATUS)) +  
  geom_bar(stat = "identity") +
  geom_errorbar(aes(xmin = `_low`, xmax = `_upp`), width = 0.2, alpha = 0.5) +
  facet_wrap(~COUNTRY) + theme_minimal() + 
  theme(axis.title = element_blank(), legend.position = "none") + 
  scale_x_continuous(labels = scales::label_percent())

4 Analysis

4.1 Adoption among women who did not use any method at Phase 1

The same srvyr toolkit can be used to model our dependent variables with survey::svyglm.

Consider women who were not using a method at Phase 1:

adopt_glm <- dat %>% 
  filter(CP_1 == 0) %>%
  mutate(adopt = FPSTATUS == "Adopted") %>% 
  group_by(COUNTRY) %>%
  summarise(
    adopt = cur_data() %>% 
      as_survey_design(weight = PANELWEIGHT, id = EAID_1, strata = STRATA_1)%>%
      svyglm(
        adopt ~ CVINCOMELOSS_2 + COVIDCONCERN_2 + URBAN + WEALTHT_2 + EDUCATTGEN_2 + AGE_2,
        family = "quasibinomial", design = .
      ) %>% 
      broom::tidy(exp = TRUE) %>% 
      mutate(sig = gtools::stars.pval(p.value)) %>% 
      list()
  )
#adopt_glm

For Phase 1 non-users in Burkina Faso, very high levels of concern about becoming infected with COVID-19 are significantly associated with higher chances of adopting a contraceptive method (relative to women who had no such concern).

Lesser levels of concern are not statistically significant, nor is household income loss from COVID-19.

adopt_glm %>% 
    filter(COUNTRY == "Burkina Faso") %>% 
    unnest(adopt) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()
COUNTRY term estimate std.error statistic p.value sig
Burkina Faso (Intercept) 0.099 0.366 -6.328 0.000 ***
Burkina Faso CVINCOMELOSS_2Yes 1.284 0.155 1.611 0.109
Burkina Faso COVIDCONCERN_2A little concerned 1.801 0.373 1.576 0.117
Burkina Faso COVIDCONCERN_2Concerned 1.367 0.351 0.891 0.375
Burkina Faso COVIDCONCERN_2Very concerned 1.906 0.318 2.025 0.045 *
Burkina Faso URBANUrban 1.359 0.186 1.648 0.101
Burkina Faso WEALTHT_2Middle tertile 0.962 0.170 -0.230 0.818
Burkina Faso WEALTHT_2Highest tertile 0.735 0.220 -1.398 0.164
Burkina Faso EDUCATTGEN_2Primary/Middle school 1.436 0.161 2.240 0.027 *
Burkina Faso EDUCATTGEN_2Secondary/post-primary 1.509 0.181 2.270 0.025 *
Burkina Faso EDUCATTGEN_2Tertiary/post-secondary 2.301 0.352 2.367 0.019 *
Burkina Faso AGE_225-34 1.720 0.180 3.018 0.003 **
Burkina Faso AGE_235-49 1.078 0.195 0.385 0.701

In Kenya, neither of these measures are significantly predictive of adoption among non-users.

adopt_glm %>% 
  filter(COUNTRY == "Kenya") %>% 
    unnest(adopt) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()
COUNTRY term estimate std.error statistic p.value sig
Kenya (Intercept) 0.104 0.371 -6.093 0.000 ***
Kenya CVINCOMELOSS_2Yes 1.197 0.111 1.614 0.108
Kenya COVIDCONCERN_2A little concerned 0.645 0.351 -1.249 0.213
Kenya COVIDCONCERN_2Concerned 0.794 0.256 -0.900 0.369
Kenya COVIDCONCERN_2Very concerned 0.907 0.254 -0.385 0.700
Kenya URBANUrban 1.168 0.147 1.057 0.292
Kenya WEALTHT_2Middle tertile 1.119 0.112 1.007 0.315
Kenya WEALTHT_2Highest tertile 0.817 0.151 -1.344 0.180
Kenya EDUCATTGEN_2Primary/Middle school 2.296 0.273 3.048 0.003 **
Kenya EDUCATTGEN_2Secondary/post-primary 2.874 0.302 3.495 0.001 ***
Kenya EDUCATTGEN_2Tertiary/post-secondary 3.631 0.306 4.207 0.000 ***
Kenya AGE_225-34 3.057 0.128 8.710 0.000 ***
Kenya AGE_235-49 1.606 0.131 3.618 0.000 ***

In Nigeria, …

adopt_glm %>% 
  filter(COUNTRY == "Nigeria") %>% 
    unnest(adopt) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()
COUNTRY term estimate std.error statistic p.value sig
Nigeria (Intercept) 0.006 0.707 -7.242 0.000 ***
Nigeria CVINCOMELOSS_2Yes 1.571 0.274 1.649 0.104
Nigeria COVIDCONCERN_2A little concerned 1.138 0.440 0.293 0.770
Nigeria COVIDCONCERN_2Concerned 0.881 0.393 -0.322 0.749
Nigeria COVIDCONCERN_2Very concerned 1.057 0.398 0.140 0.889
Nigeria URBANUrban 2.993 0.463 2.366 0.021 *
Nigeria WEALTHT_2Middle tertile 0.996 0.254 -0.017 0.986
Nigeria WEALTHT_2Highest tertile 1.060 0.236 0.249 0.805
Nigeria EDUCATTGEN_2Primary/Middle school 3.543 0.427 2.965 0.004 **
Nigeria EDUCATTGEN_2Secondary/post-primary 4.431 0.438 3.396 0.001 **
Nigeria EDUCATTGEN_2Tertiary/post-secondary 5.465 0.403 4.212 0.000 ***
Nigeria AGE_225-34 3.493 0.247 5.054 0.000 ***
Nigeria AGE_235-49 2.318 0.245 3.429 0.001 **

4.2 Discontinuation among women who were using a method at Phase 1

What about method dicontinuation for women who were using a method at Phase 1?

stop_glm <- dat %>% 
    filter(CP_1 == 1) %>% 
    mutate(stop = FPSTATUS == "Discontinued") %>% 
    group_by(COUNTRY) %>%
    summarise(
        stop = cur_data() %>% 
        as_survey_design(weight = PANELWEIGHT, id = EAID_1, strata = STRATA_1)%>%
        svyglm(
            stop ~ CVINCOMELOSS_2 + COVIDCONCERN_2 + URBAN + WEALTHT_2 + EDUCATTGEN_2 + AGE_2,
            family = "quasibinomial", design = .
            )%>% 
        broom::tidy(exp = TRUE) %>% 
        mutate(sig = gtools::stars.pval(p.value)) %>% 
        list()
    )

#stop_glm 

This time, neither of the COVID-19 measures are significantly associated with discontinuation for Phase 1 contraceptive users in Burkina Faso.

stop_glm %>% 
    filter(COUNTRY == "Burkina Faso") %>% 
    unnest(stop) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()
COUNTRY term estimate std.error statistic p.value sig
Burkina Faso (Intercept) 0.536 0.407 -1.535 0.127
Burkina Faso CVINCOMELOSS_2Yes 0.857 0.185 -0.835 0.405
Burkina Faso COVIDCONCERN_2A little concerned 1.182 0.442 0.379 0.705
Burkina Faso COVIDCONCERN_2Concerned 0.922 0.425 -0.192 0.848
Burkina Faso COVIDCONCERN_2Very concerned 0.935 0.335 -0.200 0.842
Burkina Faso URBANUrban 0.951 0.231 -0.215 0.830
Burkina Faso WEALTHT_2Middle tertile 1.469 0.211 1.824 0.070 .
Burkina Faso WEALTHT_2Highest tertile 0.797 0.238 -0.952 0.343
Burkina Faso EDUCATTGEN_2Primary/Middle school 1.294 0.212 1.215 0.226
Burkina Faso EDUCATTGEN_2Secondary/post-primary 1.161 0.250 0.596 0.552
Burkina Faso EDUCATTGEN_2Tertiary/post-secondary 0.787 0.289 -0.828 0.409
Burkina Faso AGE_225-34 1.109 0.215 0.482 0.630
Burkina Faso AGE_235-49 0.784 0.244 -0.997 0.320

However, higher levels concern with becoming infected with COVID-19 are significantly associated with higher odds of discontinuation for Phase 1 contraceptive users in Kenya.

stop_glm %>% 
    filter(COUNTRY == "Kenya") %>% 
    unnest(stop) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()
COUNTRY term estimate std.error statistic p.value sig
Kenya (Intercept) 0.098 0.877 -2.650 0.009 **
Kenya CVINCOMELOSS_2Yes 1.007 0.158 0.043 0.965
Kenya COVIDCONCERN_2A little concerned 7.683 0.694 2.937 0.004 **
Kenya COVIDCONCERN_2Concerned 4.236 0.723 1.998 0.047 *
Kenya COVIDCONCERN_2Very concerned 3.770 0.719 1.846 0.066 .
Kenya URBANUrban 1.120 0.135 0.836 0.404
Kenya WEALTHT_2Middle tertile 0.843 0.153 -1.115 0.266
Kenya WEALTHT_2Highest tertile 0.888 0.180 -0.659 0.511
Kenya EDUCATTGEN_2Primary/Middle school 0.787 0.349 -0.687 0.493
Kenya EDUCATTGEN_2Secondary/post-primary 0.958 0.367 -0.118 0.907
Kenya EDUCATTGEN_2Tertiary/post-secondary 1.099 0.397 0.238 0.812
Kenya AGE_225-34 0.783 0.153 -1.603 0.110
Kenya AGE_235-49 0.589 0.153 -3.450 0.001 ***

In Nigeria, …

stop_glm %>% 
    filter(COUNTRY == "Nigeria") %>% 
    unnest(stop) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()
COUNTRY term estimate std.error statistic p.value sig
Nigeria (Intercept) 1.647 0.844 0.591 0.557
Nigeria CVINCOMELOSS_2Yes 0.813 0.279 -0.744 0.460
Nigeria COVIDCONCERN_2A little concerned 0.773 0.504 -0.511 0.611
Nigeria COVIDCONCERN_2Concerned 0.659 0.529 -0.789 0.433
Nigeria COVIDCONCERN_2Very concerned 0.710 0.409 -0.837 0.406
Nigeria URBANUrban 0.569 0.526 -1.072 0.288
Nigeria WEALTHT_2Middle tertile 1.413 0.349 0.990 0.326
Nigeria WEALTHT_2Highest tertile 1.518 0.387 1.078 0.286
Nigeria EDUCATTGEN_2Primary/Middle school 0.288 0.564 -2.203 0.032 *
Nigeria EDUCATTGEN_2Secondary/post-primary 0.458 0.642 -1.216 0.229
Nigeria EDUCATTGEN_2Tertiary/post-secondary 0.371 0.674 -1.469 0.148
Nigeria AGE_225-34 1.291 0.412 0.620 0.538
Nigeria AGE_235-49 1.042 0.409 0.101 0.920

For more R tips for IPUMS data, check out: