This report examines the determinants of digital payment use using the Global Findex 2025 microdata. The objective is to demonstrate a reproducible workflow in R, including data cleaning, variable construction, survey-weighted descriptive analysis, logistic regression, and predicted-probability visualisation.
The analysis focuses on whether digital payment use varies systematically by education, income, labour-force participation, sex, age, and urban residence.
The analysis uses the Global Findex 2025 microdata, which is an individual-level survey covering approximately 144,090 respondents across 141 economies. The Global Findex is a cross-country dataset on digital and financial inclusion, including measures of account ownership, digital payments, savings, borrowing, digital connectivity, and financial resilience. The unit of analysis is the individual respondent. Survey weights are used throughout to account for unequal probability of selection and post-stratification adjustments.
The outcome variable is anydigpayment, defined in the
codebook as whether the respondent made or received a digital payment in
the past year. Key predictors used in this report are sex, age,
education, within-economy income quintile, labour-force participation,
urban residence, internet use, and mobile phone ownership.
findex_raw <- read_csv(here("data_raw", "findex_microdata_2025.csv")) %>%
clean_names()
findex_core <- findex_raw %>%
select(economy,economycode, regionwb, wgt, anydigpayment, female, age, educ, inc_q, emp_in, urbanicity, internet_use, con1, account, dig_account)
glimpse(findex_core)
## Rows: 144,090
## Columns: 15
## $ economy <chr> "Nicaragua", "Costa Rica", "Mali", "Kuwait", "Turkiye", …
## $ economycode <chr> "NIC", "CRI", "MLI", "KWT", "TUR", "TWN", "HND", "ITA", …
## $ regionwb <chr> "Latin America & Caribbean (excluding high income)", "La…
## $ wgt <dbl> 0.9273146, 1.3838843, 1.3234863, 1.4756932, 0.6518006, 0…
## $ anydigpayment <dbl> 0, 1, 0, NA, 1, NA, 0, NA, 0, 0, NA, 1, 1, NA, 0, 0, NA,…
## $ female <dbl> 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1, 1, 1,…
## $ age <dbl> 53, 48, 40, 25, 72, 47, 57, 52, 60, 52, 28, 48, 78, 55, …
## $ educ <dbl> 1, 2, 1, 3, 2, 3, 1, 3, 2, 2, 3, 2, 1, 3, 2, 3, 3, 1, 2,…
## $ inc_q <dbl> 5, 3, 2, 5, 4, 3, 3, 5, 2, 3, 5, 3, 5, 2, 5, 5, 5, 2, 3,…
## $ emp_in <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ urbanicity <dbl> 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1,…
## $ internet_use <dbl> 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1,…
## $ con1 <dbl> 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ account <dbl> 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,…
## $ dig_account <dbl> 0, 0, 0, NA, 0, NA, 0, NA, 0, 0, NA, 1, 0, NA, 0, 0, NA,…
summary(findex_core)
## economy economycode regionwb wgt
## Length:144090 Length:144090 Length:144090 Min. :0.07817
## Class :character Class :character Class :character 1st Qu.:0.47814
## Mode :character Mode :character Mode :character Median :0.79420
## Mean :1.00000
## 3rd Qu.:1.28491
## Max. :7.54908
##
## anydigpayment female age educ
## Min. :0.000 Min. :1.000 Min. : 15.00 Min. :1.000
## 1st Qu.:0.000 1st Qu.:1.000 1st Qu.: 28.00 1st Qu.:1.000
## Median :1.000 Median :1.000 Median : 40.00 Median :2.000
## Mean :0.561 Mean :1.475 Mean : 43.07 Mean :1.945
## 3rd Qu.:1.000 3rd Qu.:2.000 3rd Qu.: 57.00 3rd Qu.:2.000
## Max. :1.000 Max. :2.000 Max. :100.00 Max. :3.000
## NA's :41136 NA's :588
## inc_q emp_in urbanicity internet_use
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :0.0000
## 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.0000
## Median :3.000 Median :1.000 Median :1.000 Median :1.0000
## Mean :3.195 Mean :1.401 Mean :1.438 Mean :0.7534
## 3rd Qu.:4.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:1.0000
## Max. :5.000 Max. :2.000 Max. :2.000 Max. :1.0000
## NA's :1020 NA's :4020 NA's :2526
## con1 account dig_account
## Min. :1.000 Min. :0.0000 Min. :0.000
## 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.000
## Median :1.000 Median :1.0000 Median :0.000
## Mean :1.105 Mean :0.7378 Mean :0.496
## 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.000
## Max. :4.000 Max. :1.0000 Max. :1.000
## NA's :41136
Data cleaning note: The descriptive stats and regressions will be weighted using ‘wgt,’ as given in the original dataset. These final weights combine base sampling and post-stratification adjustments, as pointed out in the microdata documentation.
write_csv(findex_core, here("data_clean", "findex_core_raw.csv"))
findex <- read_csv(here("data_clean", "findex_core_raw.csv"))
#digital payment
findex <- findex %>%
mutate(digital_payment = if_else(anydigpayment == 1, 1, 0, missing = NA_real_))
#gender
findex <- findex %>%
mutate(female_bin = case_when(female == 1 ~ 1,
female == 2 ~ 0,
TRUE ~ NA_real_))
#education
findex <- findex %>%
mutate(educ_cat = factor (educ,
levels = c(1,2,3),
labels = c("Primary or less", "Secondary", "Tertiary+")))
#income quintile
findex <- findex %>%
mutate(income_quintile = factor(inc_q,
levels = c(1,2,3,4,5),
labels = c("Q1 poorest",
"Q2",
"Q3",
"Q4",
"Q5 richest")))
#employment
findex <- findex %>%
mutate(in_workforce = case_when(emp_in == 1~1,
emp_in == 2~0,
TRUE ~ NA_real_))
#rural / urbancity
findex <- findex %>%
mutate(urban = case_when(urbanicity == 2~1,
urbanicity == 1~0,
TRUE ~ NA_real_))
#internet use
findex <- findex %>%
mutate(internet_use = if_else(internet_use == 1, 1, 0,
missing = NA_real_))
#mobile phone bin & factor (since the value is not binary, mindful that there are people who said don't know & refused to say anything).
findex <- findex %>%
mutate(mobile_phone_bin = case_when(con1 == 1 ~ 1,
con1 == 2 ~ 0,
con1 %in% c(3,4) ~ NA_real_,
TRUE ~ NA_real_))
findex <- findex %>%
mutate(mobile_phone_factor = factor(mobile_phone_bin,
levels = c(0,1),
labels = c("No mobile phone", "Has mobile phone")))
#age group
findex <- findex %>%
mutate(age_group = case_when(age < 25 ~ "15-24",
age < 40 ~ "25-39",
age < 55 ~ "40-54",
TRUE ~ "55+") %>%
factor(levels = c("15-24", "25-39", "40-54", "55+")))
# create a new dataset under 'findex_analysis' to drop observations with missing outcomes / key predictors
findex_analysis <- findex %>%
filter(!is.na(digital_payment), !is.na(age), !is.na(female_bin))
write_csv(findex_analysis, here("data_clean", "findex_digital_payment_analysis.csv"))
Global Findex is not a simple random dataset. It is a complex survey dataset designed to represent the population of each country.
The documentation explains that respondents are selected through multi-stage sampling, observations have weights to ensure representativeness, and final weights correct for unequal probability of selection and nonresponse.
Therefore, all descriptive statistics and regressions are estimated using survey-weighted methods to account for unequal probability of selection and nonresponse adjustments.
# create a new dataset under 'df' to load the clean analysis file
df <- read_csv(here("data_clean", "findex_digital_payment_analysis.csv"))
# apply the survey package and create a new dataset under 'design'
design <- svydesign(ids = ~1,
weights = ~wgt,
data = df)
first, lets look at the overall prevalence
svymean(x = ~digital_payment, design = design, na.rm = TRUE)
## mean SE
## digital_payment 0.52842 0.0019
second, let’s see by subgroups of ‘education category’, ‘income quintile’, and ‘age group.’
svyby(formula = ~digital_payment, by = ~educ_cat, design = design, svymean, na.rm = TRUE)
## educ_cat digital_payment se
## Primary or less Primary or less 0.3727747 0.003131875
## Secondary Secondary 0.6038913 0.002563764
## Tertiary+ Tertiary+ 0.8403202 0.003698417
svyby(formula = ~digital_payment, by = ~income_quintile, design = design, svymean, na.rm = TRUE)
## income_quintile digital_payment se
## Q1 poorest Q1 poorest 0.3925156 0.004472808
## Q2 Q2 0.4703337 0.004467954
## Q3 Q3 0.5278006 0.004334823
## Q4 Q4 0.5818216 0.004200231
## Q5 richest Q5 richest 0.6656116 0.003799194
svyby(formula = ~digital_payment, by = ~age_group, design = design, svymean, na.rm = TRUE)
## age_group digital_payment se
## 15-24 15-24 0.4362003 0.004006695
## 25-39 25-39 0.5971618 0.003316999
## 40-54 40-54 0.5537016 0.003996658
## 55+ 55+ 0.5068094 0.004151287
Digital payment use rises sharply with education. The weighted descriptive pattern suggests a strong gradient, with respondents who have tertiary education showing substantially higher digital payment use than those with primary education or less. A similar gradient appears across the within-economy income distribution, with higher-income respondents more likely to use digital payments.
In summary, by the look of it, education is the highest predictor of digital payment (then income, then age). To corroborate this, let’s build some regression models.
We will use the weighted logistic regression to build the models. ‘m1’ will be the primary model that we will study in this analysis. ‘m2’ is created as a secondary model with additional variables that might be useful for further analysis.
# m1 - create a primary model to study
m1 <- svyglm(formula = digital_payment ~ female_bin + age + educ_cat + income_quintile + in_workforce + urban,
design = design,
family = quasibinomial())
# m2 - create a secondary model, with additional variables to study
m2 <- svyglm(formula = digital_payment ~ female_bin + age + educ_cat + income_quintile + in_workforce + urban + internet_use + mobile_phone_bin + dig_account,
design = design,
family = quasibinomial())
tbl_regression(m1, exponentiate = TRUE) %>%
modify_caption("**Determinants of Digital Payment Use**") %>%
modify_header(label = "**Characteristics**") %>%
modify_footnote(everything() ~ "Odds ratios reported. Estimates are weighted using survey sampling weights. Reference categories used are primary education or less, poorest income quintile, out of workforce, rural residence, and male respondents")
| Characteristics1 | OR1 | 95% CI1 | p-value1 |
|---|---|---|---|
| female_bin | 0.87 | 0.84, 0.90 | <0.001 |
| age | 1.01 | 1.01, 1.01 | <0.001 |
| educ_cat | |||
| Primary or less | — | — | |
| Secondary | 2.78 | 2.68, 2.88 | <0.001 |
| Tertiary+ | 7.03 | 6.60, 7.49 | <0.001 |
| income_quintile | |||
| Q1 poorest | — | — | |
| Q2 | 1.26 | 1.19, 1.33 | <0.001 |
| Q3 | 1.42 | 1.34, 1.50 | <0.001 |
| Q4 | 1.63 | 1.54, 1.72 | <0.001 |
| Q5 richest | 1.96 | 1.86, 2.07 | <0.001 |
| in_workforce | 2.23 | 2.15, 2.31 | <0.001 |
| urban | 1.14 | 1.10, 1.18 | <0.001 |
| 1 Odds ratios reported. Estimates are weighted using survey sampling weights. Reference categories used are primary education or less, poorest income quintile, out of workforce, rural residence, and male respondents | |||
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
tbl_regression(m2, exponentiate = TRUE) %>%
modify_caption("**Determinants of Digital Payment Use**") %>%
modify_header(label = "**Characteristics**") %>%
modify_footnote(everything() ~ "Odds ratios reported. Estimates are weighted using survey sampling weights. Reference categories used are primary education or less, poorest income quintile, out of workforce, rural residence, male respondents, non internet users, and non mobile phone users.")
| Characteristics1 | OR1 | 95% CI1 | p-value1 |
|---|---|---|---|
| female_bin | 0.87 | 0.83, 0.92 | <0.001 |
| age | 1.03 | 1.03, 1.03 | <0.001 |
| educ_cat | |||
| Primary or less | — | — | |
| Secondary | 1.44 | 1.36, 1.53 | <0.001 |
| Tertiary+ | 2.41 | 2.18, 2.65 | <0.001 |
| income_quintile | |||
| Q1 poorest | — | — | |
| Q2 | 1.14 | 1.05, 1.24 | 0.002 |
| Q3 | 1.17 | 1.08, 1.27 | <0.001 |
| Q4 | 1.24 | 1.14, 1.35 | <0.001 |
| Q5 richest | 1.35 | 1.24, 1.46 | <0.001 |
| in_workforce | 1.44 | 1.37, 1.52 | <0.001 |
| urban | 0.92 | 0.88, 0.98 | 0.004 |
| internet_use | 1.44 | 1.35, 1.53 | <0.001 |
| mobile_phone_bin | 1.47 | 1.35, 1.59 | <0.001 |
| dig_account | 85.4 | 80.0, 91.1 | <0.001 |
| 1 Odds ratios reported. Estimates are weighted using survey sampling weights. Reference categories used are primary education or less, poorest income quintile, out of workforce, rural residence, male respondents, non internet users, and non mobile phone users. | |||
| Abbreviations: CI = Confidence Interval, OR = Odds Ratio | |||
Regression table notes: It’s interesting to observe the trends on internet use & mobil phone ownership. not so much on the digital account ownership, which makes sense because if you have a digital payment account, you’re likely to have done digital payment.
Tables above report odds ratios from a survey-weighted logistic regression. Odds ratios greater than 1 indicate a positive association with digital payment use, while odds ratios below 1 indicate a negative association.
Relative to respondents with primary education or less, those with secondary and tertiary education exhibit substantially higher odds of digital payment use. Higher income quintiles and labour-force participation are also positively associated with digital payment use, while women show slightly lower odds than men, conditional on the included covariates.
edu_plot_data <- svyby(formula = ~digital_payment,
by = ~educ_cat,
design = design,
svymean,
na.rm = TRUE) %>%
as_tibble()
edu_plot_data %>%
ggplot(aes(x = educ_cat, y = digital_payment)) +
geom_col() +
scale_y_continuous(labels = percent_format()) +
labs (title = "Weighted Digital Payment Use by Education Level",
x = NULL,
y = "share using digital payments") +
theme_classic()
ggsave(here("output", "figures", "digital_payment_by_education.png"))
income_plot_data <- svyby(formula = ~digital_payment,
by = ~income_quintile,
design = design,
svymean,
na.rm = TRUE) %>%
as_tibble()
income_plot_data %>%
ggplot(aes(x = income_quintile, y = digital_payment)) +
geom_col() +
scale_y_continuous(labels = percent_format()) +
labs (title = "Weighted Digital Payment Use by Income Quintile",
x = "Income groups (in quintile)",
y = "share using digital payments")+
theme_classic()
ggsave(here("output", "figures", "digital_payment_by_income_quintile.png"))
To complement the odds-ratio table, predicted probabilities are calculated from the fitted model while holding other covariates constant. This helps translate regression coefficients into more intuitive quantities.
The question we’re trying to answer is: **“given the model, what is the predicted chance of using digital payments for different groups, while holding other variables fixed?”_** The primari model (m1) will be used in this analysis.
Finally, the basic workflow that will be use to generate the predicted probabilities is the following: - we will first fit the model, - then create new dataset, - then use ‘predict()’ function to plot the predicted probabilities, - after that, we attach the fitted probabilities, - and plot with ‘ggplot()’ function.
comparing three education levels, for male respondents, at average age, in the middle income quintile, in the workforce, and living in an urban area.
# 1) pred_educ - create the dataset
pred_educ <- tibble(female_bin = 0,
age = mean(findex_analysis$age, na.rm = TRUE),
educ_cat = factor(c("Primary or less", "Secondary", "Tertiary+"),
levels = levels(findex_analysis$educ_cat)),
income_quintile = factor("Q3", levels = levels(findex_analysis$income_quintile)),
in_workforce = 1,
urban = 1)
# 2) pred_prob - generate the predicted probabilities using predic() function
pred_prob <- predict(m1,
newdata = pred_educ,
type = "response")
# 3) plot the predicted probabilities
pred_educ %>%
ggplot(aes(x = educ_cat, y = pred_prob)) +
geom_col() +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(title = "Predicted Probability of Digital Payment Use by Education Level",
subtitle = "Predictions from survey-weighted logistic regression, holding other covariates constant",
x = NULL,
y = "predicted probability") +
theme_classic()
Takeaway: The figure above shows model-predicted probabilities of digital payment use by education level, holding age, income quintile, workforce status, sex, and urban residence constant. The predicted probability rises sharply with education, indicating that educational attainment is strongly associated with digital financial inclusion.
ggsave(here("output", "figures", "prediction_probabilities_by_education.png"))
comparing five income quintiles, for male respondents, at average age, with secondary education level, in the workforce, and living in an urban area.
# 1) create the dataset
pred_income <- tibble(female_bin = 0,
age = mean(findex_analysis$age, na.rm = TRUE),
educ_cat = factor("Secondary", levels = levels(findex_analysis$educ_cat)),
income_quintile = factor(c("Q1 poorest", "Q2", "Q3", "Q4", "Q5 richest"),
levels = levels(findex_analysis$income_quintile)),
in_workforce = 1,
urban = 1)
# 2) generate the predicted probabilities
pred_income$pred_prob <- predict(m1,
newdata = pred_income,
type = "response")
# 3) plot the predicted probabilities
pred_income %>%
ggplot(aes(x = income_quintile, y = pred_prob)) +
geom_col() +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
labs(title = "Predicted Probability of Digital Payment Use by Income Quintile",
subtitle = "Predictions from survey-weighted logistic regression, holding other covariates constant",
x = NULL,
y = "predicted probability") +
theme_classic()
Takeaway: The figure above shows model-predicted probabilities of digital payment use by income level, holding age, education level, workforce status, sex, and urban residence constant. The predicted probability rises with income, indicating that income is associated with digital financial inclusion.
ggsave(here("output", "figures", "prediction_probabilities_by_education.png"))
We can do the same analysis & plotting for internet use and mobile phone ownership by adjusting the codes above. For the purpose of having a streamlined documentation, we will only limit the analysis for education level and income quintile.
This analysis finds clear socioeconomic gradients in digital payment use. Education is one of the strongest correlates, as respondents with higher educational attainment are much more likely to use digital payments. Income gradients are also substantial, both descriptively and in model-based predicted probabilities. In addition, the workforce participation and urban residence are positively associated with digital payment adoption, while women exhibit slightly lower odds than men after controlling for observed characteristics.
Overall, the results suggest that digital financial inclusion remains closely linked to educational attainment, labour-market attachment, and economic resources.
This analysis is descriptive and associational rather than causal. The Global Findex is a cross-sectional survey, so the results should not be interpreted as causal effects of education or income on digital payment use.
In addition, the survey mode differs across contexts, with face-to-face interviews in many low- and middle-income economies and phone surveys in many high-income economies.
Finally, the outcome variable captures whether respondents made or received a digital payment in the past year, but does not measure frequency or intensity of use.
This report was produced in R Markdown from a single master script that performs data import, cleaning, variable construction, survey-weighted estimation, and figure generation. All outputs are generated programmatically from the raw microdata.
World Bank. (2025). The Global Findex Database 2025: Connectivity and financial inclusion in the digital economy [Microdata]. World Bank Microdata Library. https://microdata.worldbank.org/catalog/7860/get-microdata
World Bank. (2025). The Global Findex Database 2025: Connectivity and financial inclusion in the digital economy (Study description). World Bank Microdata Library. https://microdata.worldbank.org/catalog/7860/study-description