PMA panel data analysis practice 1

Revised Matt Gunther’s code for a webinar on COVID-19 impact on contraceptive use/dynamics. Source code and other resources - https://pma.ipums.org/pma/tutorials.shtml)

(Updated on 2022-10-19)

# Load data into R with `ipumsr`
dat <- read_ipums_micro(
  ddi = "data/pma_00005.xml",
  data = "data/pma_00005.dat.gz"
)

1 Analytic Sample

# As of 10/18/2022, data from four countries are available. But, DRC data creates an error - something related with Strata
temp <- dat %>% filter(COUNTRY==2)
summary(temp$STRATA_1)

# As of 10/18/2022, data from four countries are available. But, DRC data creates an error - something related with Strata. will need to assign artificial value since there was no strata in Kinshasa or Kongo Central. 
# Drop DRC for now. 
dat <- dat %>% filter(COUNTRY!=2)

PMA uses an open panel design - women may enter the panel after Phase 1, and they may be lost to follow-up after any phase.

See RESULTFQ

Women who enter the panel at Phase 2 are NA for all variables at Phase 1.

dat %>% count(RESULTFQ_1) %>% kable()

RESULTFQ_1	n
1	18941
2	334
3	20
4	99
5	41
10	127
95	1
99	1
NA	5543

Women whose households were not found again after Phase 1 are NA for all variables at Phase 2.

dat %>% count(RESULTFQ_2) %>% kable()

RESULTFQ_2	n
1	19735
2	239
3	40
4	193
5	24
7	44
10	212
95	9
96	214
99	1490
NA	2907

We will only include women who were available and completed the Female Questionnaire for both Phase 1 and Phase 2.

dat <- dat %>% filter(RESULTFQ_1 == 1 & RESULTFQ_2 == 1)

dat %>% count(RESULTFQ_1, RESULTFQ_2) %>% kable()

RESULTFQ_1	RESULTFQ_2	n
1	1	14632

Additionally, PMA samples are only valid for the de facto population: women who slept in the household the night before the Household interview.

See RESIDENT

dat %>% count(RESIDENT_1) %>% kable()

RESIDENT_1	n
11	158
21	215
22	14259

We’ll also drop cases where the woman was not part of the de facto population in either Phase 1 or Phase 2.

dat <- dat %>% filter(RESIDENT_1 %in% c(11, 22) & RESIDENT_2 %in% c(11, 22))

How many cases remain?

dat %>% count(COUNTRY) %>% kable()

COUNTRY	n
1	5208
7	6935
9	2087

2 Recoding Independent variables

PMA surveys contain many categorical variables. These are usually represented as factors in R.

In an IPUMS data extract, you won’t see factors!

Instead, we generate labelled numeric variables (note the label in brackets).

dat %>% ipums_var_label(CVINCOMELOSS_2)

[1] “Income loss resulted from COVID-19 restrictions”

dat %>% count(CVINCOMELOSS_2) %>% kable()

CVINCOMELOSS_2	n
0	781
1	9221
97	8
99	4220

The ipumsr package contains tools for working with labelled IPUMS data.

Usually, we handle codes like 99 [NIU (not in universe)] before transforming other missing data to NA.

dat %>% count(CVINCOMELOSS_2, HHINCOMELOSSAMT_2) %>% kable()

CVINCOMELOSS_2	HHINCOMELOSSAMT_2	n
0	2	654
0	3	127
1	2	6624
1	3	2597
97	2	8
99	1	4204
99	98	16

Tip:

Information the code `NIU (not in universe)` can always be found on a variable's [universe tab](https://pma.ipums.org/pma-action/variables/CVINCOMELOSS#universe_section).       
For [CVINCOMELOSS_2](https://pma.ipums.org/pma-action/variables/CVINCOMELOSS), `99 [NIU (not in universe)]` may indicate that the household experienced *no income loss in the last year*, or it may indicate that [HHINCOMELOSSAMT_2](https://pma.ipums.org/pma-action/variables/HHINCOMELOSSAMT) is `98 [No response or missing]`. 

We should treat the `NIU` women from households without *any* income loss as "No" in [CVINCOMELOSS_2](https://pma.ipums.org/pma-action/variables/CVINCOMELOSS).

dat <- dat %>% 
  mutate(
    CVINCOMELOSS_2 = CVINCOMELOSS_2 %>% 
      labelled::recode_if(HHINCOMELOSSAMT_2 == 1, 0)
  )

dat %>% count(CVINCOMELOSS_2, HHINCOMELOSSAMT_2) %>% kable()

CVINCOMELOSS_2	HHINCOMELOSSAMT_2	n
0	1	4204
0	2	654
0	3	127
1	2	6624
1	3	2597
97	2	8
99	98	16

Next, we’ll use NA to represent the remaining values above 90:

97 [Don't know] and
remaining cases marked 99 [NIU (not in universe)]

dat <- dat %>% 
  mutate(
    CVINCOMELOSS_2 = CVINCOMELOSS_2 %>% 
      lbl_na_if(~.val > 90)
  )  

dat %>% count(CVINCOMELOSS_2, HHINCOMELOSSAMT_2) %>% kable()

CVINCOMELOSS_2	HHINCOMELOSSAMT_2	n
0	1	4204
0	2	654
0	3	127
1	2	6624
1	3	2597
NA	2	8
NA	98	16

Once you’re done with labels, we recommend transforming key variables into factors with forcats::as_factor.

The forcats package is included when you load library(tidyverse).

dat <- dat %>% mutate(CVINCOMELOSS_2 = as_factor(CVINCOMELOSS_2))

dat %>% count(CVINCOMELOSS_2) %>% kable()

CVINCOMELOSS_2	n
No	4985
Yes	9221
NA	24

This will make categorical variables easier to use in data visualization and as “dummy” variables in regression analysis.

Likert-style questions can be treated as factors, too.

dat %>% ipums_var_label(COVIDCONCERN_2)

[1] “Concerned about getting infected”

dat %>% count(COVIDCONCERN_2) %>% kable()

COVIDCONCERN_2	n
1	507
2	842
3	2865
4	10002
5	11
98	3

This time we’ll treat codes 5 and above as NA.

dat <- dat %>% 
  mutate(
    COVIDCONCERN_2 = COVIDCONCERN_2 %>% 
      lbl_na_if(~.val >= 5) %>% 
      as_factor()
  )

dat %>% count(COVIDCONCERN_2) %>% kable()

COVIDCONCERN_2	n
Not concerned	507
A little concerned	842
Concerned	2865
Very concerned	10002
NA	14

You can apply the same transformation to several variables with help from dplyr::across.

dplyr is another package included when you load library(tidyverse).

dat <- dat %>% 
  mutate(
    across(
        c(COUNTRY, URBAN, WEALTHT_2, EDUCATTGEN_2),
        ~.x %>% lbl_na_if(~.val >= 90) %>% as_factor()
    )
  )

Often, it’s important to set a reference group against which all dummy variables will be compared.

You can manually specify a refernece group when you set factor “levels” with a function like forcats::fct_relevel.

dat <- dat %>% 
  mutate(
    AGE_2 = case_when(
      AGE_2 < 25 ~ "15-24",
      AGE_2 < 35 ~ "25-34",
      AGE_2 < 50 ~ "35-49"
    ),
    AGE_2 = AGE_2 %>% fct_relevel("15-24", "25-34", "35-49")
  )

3 Dependent variables

We’ll use our recoded variables to model the likelihood of contraceptive method adoption and discontinuation between phases.

See CP

dat <- dat %>% filter(CP_1 < 90 & CP_2 < 90)

dat %>% count(CP_1, CP_2) %>% kable()

CP_1	CP_2	n
0	0	6425
0	1	2153
1	0	1328
1	1	4321

A woman has adopted a method if she was not using one at Phase 1, but then reported using one at Phase 2.

She has discontinued a method if she did use one at Phase 1, but no longer uses one at Phase 2.

dat <- dat %>% 
  mutate(
    FPSTATUS = case_when(
      CP_1 == 1 & CP_2 == 1 ~ "User",
      CP_1 == 0 & CP_2 == 0 ~ "Non-user",
      CP_1 == 1 & CP_2 == 0 ~ "Discontinued",
      CP_1 == 0 & CP_2 == 1 ~ "Adopted"
    ),
    FPSTATUS = fct_infreq(FPSTATUS)
  )

Un-weighted sample proportions for FPSTATUS can be found with count and prop.table:

dat_nowt <- dat %>% 
  group_by(COUNTRY) %>% 
  count(FPSTATUS) %>% 
  mutate(prop = prop.table(n))

dat_nowt %>% kable()

COUNTRY	FPSTATUS	n	prop
Burkina Faso	Non-user	2589	0.4972153
Burkina Faso	User	1241	0.2383330
Burkina Faso	Adopted	821	0.1576724
Burkina Faso	Discontinued	556	0.1067793
Kenya	Non-user	2518	0.3631382
Kenya	User	2676	0.3859244
Kenya	Adopted	1118	0.1612345
Kenya	Discontinued	622	0.0897029
Nigeria	Non-user	1318	0.6318313
Nigeria	User	404	0.1936721
Nigeria	Adopted	214	0.1025887
Nigeria	Discontinued	150	0.0719080

We’ll plot this table with ggplot2 (also included with the tidyverse).

dat_nowt %>% 
  ggplot(aes(x = prop, y = FPSTATUS, fill = FPSTATUS)) +  
  geom_bar(stat = "identity") +
  facet_wrap(~COUNTRY) + theme_minimal() + 
  theme(axis.title = element_blank(), legend.position = "none") + 
  scale_x_continuous(labels = scales::label_percent())

For weighted population estimates, use as_survey_design and survey_mean from the srvyr package.

Use prop = TRUE to adjust standard errors near 0% or 100% for proportions.

dat_wtd <- dat %>% 
  as_survey_design(weight = PANELWEIGHT, id = EAID_1, strata = STRATA_1) %>%
  group_by(COUNTRY, FPSTATUS) %>% 
  summarise(survey_mean(prop = TRUE, prop_method = "logit", vartype = "ci"))

dat_wtd %>% kable()

COUNTRY	FPSTATUS	coef	_low	_upp
Burkina Faso	Non-user	0.5626078	0.5286594	0.5959798
Burkina Faso	User	0.1877959	0.1639061	0.2142754
Burkina Faso	Adopted	0.1496680	0.1329277	0.1681077
Burkina Faso	Discontinued	0.0999283	0.0870280	0.1145009
Kenya	Non-user	0.3782787	0.3608162	0.3960627
Kenya	User	0.3657702	0.3495995	0.3822493
Kenya	Adopted	0.1647120	0.1530223	0.1771079
Kenya	Discontinued	0.0912391	0.0829785	0.1002321
Nigeria	Non-user	0.6513030	0.5952480	0.7034622
Nigeria	User	0.1788674	0.1496337	0.2123860
Nigeria	Adopted	0.1013731	0.0821519	0.1244817
Nigeria	Discontinued	0.0684565	0.0545171	0.0856373

dat_wtd %>% 
  ggplot(aes(x = coef, y = FPSTATUS, fill = FPSTATUS)) +  
  geom_bar(stat = "identity") +
  geom_errorbar(aes(xmin = `_low`, xmax = `_upp`), width = 0.2, alpha = 0.5) +
  facet_wrap(~COUNTRY) + theme_minimal() + 
  theme(axis.title = element_blank(), legend.position = "none") + 
  scale_x_continuous(labels = scales::label_percent())

4 Analysis

4.1 Adoption among women who did not use any method at Phase 1

The same srvyr toolkit can be used to model our dependent variables with survey::svyglm.

Consider women who were not using a method at Phase 1:

adopt_glm <- dat %>% 
  filter(CP_1 == 0) %>%
  mutate(adopt = FPSTATUS == "Adopted") %>% 
  group_by(COUNTRY) %>%
  summarise(
    adopt = cur_data() %>% 
      as_survey_design(weight = PANELWEIGHT, id = EAID_1, strata = STRATA_1)%>%
      svyglm(
        adopt ~ CVINCOMELOSS_2 + COVIDCONCERN_2 + URBAN + WEALTHT_2 + EDUCATTGEN_2 + AGE_2,
        family = "quasibinomial", design = .
      ) %>% 
      broom::tidy(exp = TRUE) %>% 
      mutate(sig = gtools::stars.pval(p.value)) %>% 
      list()
  )
#adopt_glm

For Phase 1 non-users in Burkina Faso, very high levels of concern about becoming infected with COVID-19 are significantly associated with higher chances of adopting a contraceptive method (relative to women who had no such concern).

Lesser levels of concern are not statistically significant, nor is household income loss from COVID-19.

adopt_glm %>% 
    filter(COUNTRY == "Burkina Faso") %>% 
    unnest(adopt) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()

COUNTRY	term	estimate	std.error	statistic	p.value	sig
Burkina Faso	(Intercept)	0.099	0.366	-6.328	0.000	***
Burkina Faso	CVINCOMELOSS_2Yes	1.284	0.155	1.611	0.109
Burkina Faso	COVIDCONCERN_2A little concerned	1.801	0.373	1.576	0.117
Burkina Faso	COVIDCONCERN_2Concerned	1.367	0.351	0.891	0.375
Burkina Faso	COVIDCONCERN_2Very concerned	1.906	0.318	2.025	0.045	*
Burkina Faso	URBANUrban	1.359	0.186	1.648	0.101
Burkina Faso	WEALTHT_2Middle tertile	0.962	0.170	-0.230	0.818
Burkina Faso	WEALTHT_2Highest tertile	0.735	0.220	-1.398	0.164
Burkina Faso	EDUCATTGEN_2Primary/Middle school	1.436	0.161	2.240	0.027	*
Burkina Faso	EDUCATTGEN_2Secondary/post-primary	1.509	0.181	2.270	0.025	*
Burkina Faso	EDUCATTGEN_2Tertiary/post-secondary	2.301	0.352	2.367	0.019	*
Burkina Faso	AGE_225-34	1.720	0.180	3.018	0.003	**
Burkina Faso	AGE_235-49	1.078	0.195	0.385	0.701

In Kenya, neither of these measures are significantly predictive of adoption among non-users.

adopt_glm %>% 
  filter(COUNTRY == "Kenya") %>% 
    unnest(adopt) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()

COUNTRY	term	estimate	std.error	statistic	p.value	sig
Kenya	(Intercept)	0.104	0.371	-6.093	0.000	***
Kenya	CVINCOMELOSS_2Yes	1.197	0.111	1.614	0.108
Kenya	COVIDCONCERN_2A little concerned	0.645	0.351	-1.249	0.213
Kenya	COVIDCONCERN_2Concerned	0.794	0.256	-0.900	0.369
Kenya	COVIDCONCERN_2Very concerned	0.907	0.254	-0.385	0.700
Kenya	URBANUrban	1.168	0.147	1.057	0.292
Kenya	WEALTHT_2Middle tertile	1.119	0.112	1.007	0.315
Kenya	WEALTHT_2Highest tertile	0.817	0.151	-1.344	0.180
Kenya	EDUCATTGEN_2Primary/Middle school	2.296	0.273	3.048	0.003	**
Kenya	EDUCATTGEN_2Secondary/post-primary	2.874	0.302	3.495	0.001	***
Kenya	EDUCATTGEN_2Tertiary/post-secondary	3.631	0.306	4.207	0.000	***
Kenya	AGE_225-34	3.057	0.128	8.710	0.000	***
Kenya	AGE_235-49	1.606	0.131	3.618	0.000	***

In Nigeria, …

adopt_glm %>% 
  filter(COUNTRY == "Nigeria") %>% 
    unnest(adopt) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()

COUNTRY	term	estimate	std.error	statistic	p.value	sig
Nigeria	(Intercept)	0.006	0.707	-7.242	0.000	***
Nigeria	CVINCOMELOSS_2Yes	1.571	0.274	1.649	0.104
Nigeria	COVIDCONCERN_2A little concerned	1.138	0.440	0.293	0.770
Nigeria	COVIDCONCERN_2Concerned	0.881	0.393	-0.322	0.749
Nigeria	COVIDCONCERN_2Very concerned	1.057	0.398	0.140	0.889
Nigeria	URBANUrban	2.993	0.463	2.366	0.021	*
Nigeria	WEALTHT_2Middle tertile	0.996	0.254	-0.017	0.986
Nigeria	WEALTHT_2Highest tertile	1.060	0.236	0.249	0.805
Nigeria	EDUCATTGEN_2Primary/Middle school	3.543	0.427	2.965	0.004	**
Nigeria	EDUCATTGEN_2Secondary/post-primary	4.431	0.438	3.396	0.001	**
Nigeria	EDUCATTGEN_2Tertiary/post-secondary	5.465	0.403	4.212	0.000	***
Nigeria	AGE_225-34	3.493	0.247	5.054	0.000	***
Nigeria	AGE_235-49	2.318	0.245	3.429	0.001	**

4.2 Discontinuation among women who were using a method at Phase 1

What about method dicontinuation for women who were using a method at Phase 1?

stop_glm <- dat %>% 
    filter(CP_1 == 1) %>% 
    mutate(stop = FPSTATUS == "Discontinued") %>% 
    group_by(COUNTRY) %>%
    summarise(
        stop = cur_data() %>% 
        as_survey_design(weight = PANELWEIGHT, id = EAID_1, strata = STRATA_1)%>%
        svyglm(
            stop ~ CVINCOMELOSS_2 + COVIDCONCERN_2 + URBAN + WEALTHT_2 + EDUCATTGEN_2 + AGE_2,
            family = "quasibinomial", design = .
            )%>% 
        broom::tidy(exp = TRUE) %>% 
        mutate(sig = gtools::stars.pval(p.value)) %>% 
        list()
    )

#stop_glm

This time, neither of the COVID-19 measures are significantly associated with discontinuation for Phase 1 contraceptive users in Burkina Faso.

stop_glm %>% 
    filter(COUNTRY == "Burkina Faso") %>% 
    unnest(stop) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()

COUNTRY	term	estimate	std.error	statistic	p.value	sig
Burkina Faso	(Intercept)	0.536	0.407	-1.535	0.127
Burkina Faso	CVINCOMELOSS_2Yes	0.857	0.185	-0.835	0.405
Burkina Faso	COVIDCONCERN_2A little concerned	1.182	0.442	0.379	0.705
Burkina Faso	COVIDCONCERN_2Concerned	0.922	0.425	-0.192	0.848
Burkina Faso	COVIDCONCERN_2Very concerned	0.935	0.335	-0.200	0.842
Burkina Faso	URBANUrban	0.951	0.231	-0.215	0.830
Burkina Faso	WEALTHT_2Middle tertile	1.469	0.211	1.824	0.070	.
Burkina Faso	WEALTHT_2Highest tertile	0.797	0.238	-0.952	0.343
Burkina Faso	EDUCATTGEN_2Primary/Middle school	1.294	0.212	1.215	0.226
Burkina Faso	EDUCATTGEN_2Secondary/post-primary	1.161	0.250	0.596	0.552
Burkina Faso	EDUCATTGEN_2Tertiary/post-secondary	0.787	0.289	-0.828	0.409
Burkina Faso	AGE_225-34	1.109	0.215	0.482	0.630
Burkina Faso	AGE_235-49	0.784	0.244	-0.997	0.320

However, higher levels concern with becoming infected with COVID-19 are significantly associated with higher odds of discontinuation for Phase 1 contraceptive users in Kenya.

stop_glm %>% 
    filter(COUNTRY == "Kenya") %>% 
    unnest(stop) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()

COUNTRY	term	estimate	std.error	statistic	p.value	sig
Kenya	(Intercept)	0.098	0.877	-2.650	0.009	**
Kenya	CVINCOMELOSS_2Yes	1.007	0.158	0.043	0.965
Kenya	COVIDCONCERN_2A little concerned	7.683	0.694	2.937	0.004	**
Kenya	COVIDCONCERN_2Concerned	4.236	0.723	1.998	0.047	*
Kenya	COVIDCONCERN_2Very concerned	3.770	0.719	1.846	0.066	.
Kenya	URBANUrban	1.120	0.135	0.836	0.404
Kenya	WEALTHT_2Middle tertile	0.843	0.153	-1.115	0.266
Kenya	WEALTHT_2Highest tertile	0.888	0.180	-0.659	0.511
Kenya	EDUCATTGEN_2Primary/Middle school	0.787	0.349	-0.687	0.493
Kenya	EDUCATTGEN_2Secondary/post-primary	0.958	0.367	-0.118	0.907
Kenya	EDUCATTGEN_2Tertiary/post-secondary	1.099	0.397	0.238	0.812
Kenya	AGE_225-34	0.783	0.153	-1.603	0.110
Kenya	AGE_235-49	0.589	0.153	-3.450	0.001	***

In Nigeria, …

stop_glm %>% 
    filter(COUNTRY == "Nigeria") %>% 
    unnest(stop) %>% 
    mutate_at(vars("estimate", "std.error", "statistic", "p.value"), 
              ~round(., 3)) %>%
    kable()

COUNTRY	term	estimate	std.error	statistic	p.value	sig
Nigeria	(Intercept)	1.647	0.844	0.591	0.557
Nigeria	CVINCOMELOSS_2Yes	0.813	0.279	-0.744	0.460
Nigeria	COVIDCONCERN_2A little concerned	0.773	0.504	-0.511	0.611
Nigeria	COVIDCONCERN_2Concerned	0.659	0.529	-0.789	0.433
Nigeria	COVIDCONCERN_2Very concerned	0.710	0.409	-0.837	0.406
Nigeria	URBANUrban	0.569	0.526	-1.072	0.288
Nigeria	WEALTHT_2Middle tertile	1.413	0.349	0.990	0.326
Nigeria	WEALTHT_2Highest tertile	1.518	0.387	1.078	0.286
Nigeria	EDUCATTGEN_2Primary/Middle school	0.288	0.564	-2.203	0.032	*
Nigeria	EDUCATTGEN_2Secondary/post-primary	0.458	0.642	-1.216	0.229
Nigeria	EDUCATTGEN_2Tertiary/post-secondary	0.371	0.674	-1.469	0.148
Nigeria	AGE_225-34	1.291	0.412	0.620	0.538
Nigeria	AGE_235-49	1.042	0.409	0.101	0.920

For more R tips for IPUMS data, check out:

The IPUMS PMA blog
The ipumsr documentation website
The ipums tutorials page