1 Overview

This report examines the determinants of digital payment use using the Global Findex 2025 microdata. The objective is to demonstrate a reproducible workflow in R, including data cleaning, variable construction, survey-weighted descriptive analysis, logistic regression, and predicted-probability visualisation.

The analysis focuses on whether digital payment use varies systematically by education, income, labour-force participation, sex, age, and urban residence.


2 Data and Variable Description

The analysis uses the Global Findex 2025 microdata, which is an individual-level survey covering approximately 144,090 respondents across 141 economies. The Global Findex is a cross-country dataset on digital and financial inclusion, including measures of account ownership, digital payments, savings, borrowing, digital connectivity, and financial resilience. The unit of analysis is the individual respondent. Survey weights are used throughout to account for unequal probability of selection and post-stratification adjustments.

The outcome variable is anydigpayment, defined in the codebook as whether the respondent made or received a digital payment in the past year. Key predictors used in this report are sex, age, education, within-economy income quintile, labour-force participation, urban residence, internet use, and mobile phone ownership.

2.1 Data Import and Cleaning

2.1.1 Performing the initial cleaning and keeping only relevant columns

findex_raw <- read_csv(here("data_raw", "findex_microdata_2025.csv")) %>%
  clean_names()

findex_core <- findex_raw %>%
  select(economy,economycode, regionwb, wgt, anydigpayment, female, age, educ, inc_q, emp_in, urbanicity, internet_use, con1, account, dig_account)

2.1.2 Inspecting the new dataset

glimpse(findex_core)
## Rows: 144,090
## Columns: 15
## $ economy       <chr> "Nicaragua", "Costa Rica", "Mali", "Kuwait", "Turkiye", …
## $ economycode   <chr> "NIC", "CRI", "MLI", "KWT", "TUR", "TWN", "HND", "ITA", …
## $ regionwb      <chr> "Latin America & Caribbean (excluding high income)", "La…
## $ wgt           <dbl> 0.9273146, 1.3838843, 1.3234863, 1.4756932, 0.6518006, 0…
## $ anydigpayment <dbl> 0, 1, 0, NA, 1, NA, 0, NA, 0, 0, NA, 1, 1, NA, 0, 0, NA,…
## $ female        <dbl> 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1, 1, 1,…
## $ age           <dbl> 53, 48, 40, 25, 72, 47, 57, 52, 60, 52, 28, 48, 78, 55, …
## $ educ          <dbl> 1, 2, 1, 3, 2, 3, 1, 3, 2, 2, 3, 2, 1, 3, 2, 3, 3, 1, 2,…
## $ inc_q         <dbl> 5, 3, 2, 5, 4, 3, 3, 5, 2, 3, 5, 3, 5, 2, 5, 5, 5, 2, 3,…
## $ emp_in        <dbl> 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ urbanicity    <dbl> 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 1,…
## $ internet_use  <dbl> 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1,…
## $ con1          <dbl> 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ account       <dbl> 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,…
## $ dig_account   <dbl> 0, 0, 0, NA, 0, NA, 0, NA, 0, 0, NA, 1, 0, NA, 0, 0, NA,…
summary(findex_core)
##    economy          economycode          regionwb              wgt         
##  Length:144090      Length:144090      Length:144090      Min.   :0.07817  
##  Class :character   Class :character   Class :character   1st Qu.:0.47814  
##  Mode  :character   Mode  :character   Mode  :character   Median :0.79420  
##                                                           Mean   :1.00000  
##                                                           3rd Qu.:1.28491  
##                                                           Max.   :7.54908  
##                                                                            
##  anydigpayment       female           age              educ      
##  Min.   :0.000   Min.   :1.000   Min.   : 15.00   Min.   :1.000  
##  1st Qu.:0.000   1st Qu.:1.000   1st Qu.: 28.00   1st Qu.:1.000  
##  Median :1.000   Median :1.000   Median : 40.00   Median :2.000  
##  Mean   :0.561   Mean   :1.475   Mean   : 43.07   Mean   :1.945  
##  3rd Qu.:1.000   3rd Qu.:2.000   3rd Qu.: 57.00   3rd Qu.:2.000  
##  Max.   :1.000   Max.   :2.000   Max.   :100.00   Max.   :3.000  
##  NA's   :41136                                    NA's   :588    
##      inc_q           emp_in        urbanicity     internet_use   
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :0.0000  
##  1st Qu.:2.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.:1.0000  
##  Median :3.000   Median :1.000   Median :1.000   Median :1.0000  
##  Mean   :3.195   Mean   :1.401   Mean   :1.438   Mean   :0.7534  
##  3rd Qu.:4.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:1.0000  
##  Max.   :5.000   Max.   :2.000   Max.   :2.000   Max.   :1.0000  
##  NA's   :1020    NA's   :4020    NA's   :2526                    
##       con1          account        dig_account   
##  Min.   :1.000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.000  
##  Median :1.000   Median :1.0000   Median :0.000  
##  Mean   :1.105   Mean   :0.7378   Mean   :0.496  
##  3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:1.000  
##  Max.   :4.000   Max.   :1.0000   Max.   :1.000  
##                                   NA's   :41136

Data cleaning note: The descriptive stats and regressions will be weighted using ‘wgt,’ as given in the original dataset. These final weights combine base sampling and post-stratification adjustments, as pointed out in the microdata documentation.

2.1.3 Saving the new dataset as an interim file

write_csv(findex_core, here("data_clean", "findex_core_raw.csv"))

2.2 Constructing Variables

2.2.1 Load the interim file into a new dataset called ‘Findex’

findex <- read_csv(here("data_clean", "findex_core_raw.csv"))

2.2.2 Constructing the variables in cleaned forms

#digital payment
findex <- findex %>%
  mutate(digital_payment = if_else(anydigpayment == 1, 1, 0, missing = NA_real_))

#gender
findex <- findex %>%
  mutate(female_bin = case_when(female == 1 ~ 1,
                                female == 2 ~ 0,
                                TRUE ~ NA_real_))
#education
findex <- findex %>%
  mutate(educ_cat = factor (educ, 
                            levels = c(1,2,3), 
                            labels = c("Primary or less", "Secondary", "Tertiary+")))
#income quintile
findex <- findex %>%
  mutate(income_quintile = factor(inc_q, 
                                  levels = c(1,2,3,4,5),
                                  labels = c("Q1 poorest",
                                             "Q2",
                                             "Q3",
                                             "Q4",
                                             "Q5 richest")))
#employment
findex <- findex %>%
  mutate(in_workforce = case_when(emp_in == 1~1,
                                  emp_in == 2~0,
                                  TRUE ~ NA_real_))
#rural / urbancity
findex <- findex %>%
  mutate(urban = case_when(urbanicity == 2~1,
                           urbanicity == 1~0,
                           TRUE ~ NA_real_))
#internet use
findex <- findex %>%
  mutate(internet_use = if_else(internet_use == 1, 1, 0, 
                                missing = NA_real_))
#mobile phone bin & factor (since the value is not binary, mindful that there are people who said don't know & refused to say anything).
findex <- findex %>%
  mutate(mobile_phone_bin = case_when(con1 == 1 ~ 1,
                                      con1 == 2 ~ 0,
                                      con1 %in% c(3,4) ~ NA_real_,
                                      TRUE ~ NA_real_))
findex <- findex %>%
  mutate(mobile_phone_factor = factor(mobile_phone_bin, 
                                      levels = c(0,1),
                                      labels = c("No mobile phone", "Has mobile phone")))
#age group
findex <- findex %>%
  mutate(age_group = case_when(age < 25 ~ "15-24",
                               age < 40 ~ "25-39",
                               age < 55 ~ "40-54",
                               TRUE ~ "55+") %>%
           factor(levels = c("15-24", "25-39", "40-54", "55+")))

# create a new dataset under 'findex_analysis' to drop observations with missing outcomes / key predictors
findex_analysis <- findex %>%
  filter(!is.na(digital_payment), !is.na(age), !is.na(female_bin))

2.2.3 Saving the new dataset with the cleaned variables

write_csv(findex_analysis, here("data_clean", "findex_digital_payment_analysis.csv"))

3 Analysis of digital payments

Global Findex is not a simple random dataset. It is a complex survey dataset designed to represent the population of each country.

The documentation explains that respondents are selected through multi-stage sampling, observations have weights to ensure representativeness, and final weights correct for unequal probability of selection and nonresponse.

Therefore, all descriptive statistics and regressions are estimated using survey-weighted methods to account for unequal probability of selection and nonresponse adjustments.

3.1 Apply survey design object

# create a new dataset under 'df' to load the clean analysis file
df <- read_csv(here("data_clean", "findex_digital_payment_analysis.csv"))

# apply the survey package and create a new dataset under 'design'
design <- svydesign(ids = ~1,
                    weights = ~wgt,
                    data = df)

3.2 Descriptive Results

first, lets look at the overall prevalence

svymean(x = ~digital_payment, design = design, na.rm = TRUE)
##                    mean     SE
## digital_payment 0.52842 0.0019

second, let’s see by subgroups of ‘education category’, ‘income quintile’, and ‘age group.’

svyby(formula = ~digital_payment, by = ~educ_cat, design = design, svymean, na.rm = TRUE)
##                        educ_cat digital_payment          se
## Primary or less Primary or less       0.3727747 0.003131875
## Secondary             Secondary       0.6038913 0.002563764
## Tertiary+             Tertiary+       0.8403202 0.003698417
svyby(formula = ~digital_payment, by = ~income_quintile, design = design, svymean, na.rm = TRUE)
##            income_quintile digital_payment          se
## Q1 poorest      Q1 poorest       0.3925156 0.004472808
## Q2                      Q2       0.4703337 0.004467954
## Q3                      Q3       0.5278006 0.004334823
## Q4                      Q4       0.5818216 0.004200231
## Q5 richest      Q5 richest       0.6656116 0.003799194
svyby(formula = ~digital_payment, by = ~age_group, design = design, svymean, na.rm = TRUE)
##       age_group digital_payment          se
## 15-24     15-24       0.4362003 0.004006695
## 25-39     25-39       0.5971618 0.003316999
## 40-54     40-54       0.5537016 0.003996658
## 55+         55+       0.5068094 0.004151287

3.2.1 Descriptive patterns

Digital payment use rises sharply with education. The weighted descriptive pattern suggests a strong gradient, with respondents who have tertiary education showing substantially higher digital payment use than those with primary education or less. A similar gradient appears across the within-economy income distribution, with higher-income respondents more likely to use digital payments.

In summary, by the look of it, education is the highest predictor of digital payment (then income, then age). To corroborate this, let’s build some regression models.

3.3 Regression Results

We will use the weighted logistic regression to build the models. ‘m1’ will be the primary model that we will study in this analysis. ‘m2’ is created as a secondary model with additional variables that might be useful for further analysis.

3.3.1 Building the model(s)

# m1 - create a primary model to study
m1 <- svyglm(formula = digital_payment ~ female_bin + age + educ_cat + income_quintile + in_workforce + urban, 
             design = design, 
             family = quasibinomial())
# m2 - create a secondary model, with additional variables to study
m2 <- svyglm(formula = digital_payment ~ female_bin + age + educ_cat + income_quintile + in_workforce + urban + internet_use + mobile_phone_bin + dig_account,
             design = design,
             family = quasibinomial())

3.3.2 Exporting the model tables

3.3.2.1 m1 regression table

tbl_regression(m1, exponentiate = TRUE) %>%
  modify_caption("**Determinants of Digital Payment Use**") %>%
  modify_header(label = "**Characteristics**") %>%
  modify_footnote(everything() ~ "Odds ratios reported. Estimates are weighted using survey sampling weights. Reference categories used are primary education or less, poorest income quintile, out of workforce, rural residence, and male respondents") 
Determinants of Digital Payment Use
Characteristics1 OR1 95% CI1 p-value1
female_bin 0.87 0.84, 0.90 <0.001
age 1.01 1.01, 1.01 <0.001
educ_cat


    Primary or less
    Secondary 2.78 2.68, 2.88 <0.001
    Tertiary+ 7.03 6.60, 7.49 <0.001
income_quintile


    Q1 poorest
    Q2 1.26 1.19, 1.33 <0.001
    Q3 1.42 1.34, 1.50 <0.001
    Q4 1.63 1.54, 1.72 <0.001
    Q5 richest 1.96 1.86, 2.07 <0.001
in_workforce 2.23 2.15, 2.31 <0.001
urban 1.14 1.10, 1.18 <0.001
1 Odds ratios reported. Estimates are weighted using survey sampling weights. Reference categories used are primary education or less, poorest income quintile, out of workforce, rural residence, and male respondents
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

3.3.2.2 m2 regression table

tbl_regression(m2, exponentiate = TRUE) %>%
  modify_caption("**Determinants of Digital Payment Use**") %>%
  modify_header(label = "**Characteristics**") %>%
  modify_footnote(everything() ~ "Odds ratios reported. Estimates are weighted using survey sampling weights. Reference categories used are primary education or less, poorest income quintile, out of workforce, rural residence, male respondents, non internet users, and non mobile phone users.")
Determinants of Digital Payment Use
Characteristics1 OR1 95% CI1 p-value1
female_bin 0.87 0.83, 0.92 <0.001
age 1.03 1.03, 1.03 <0.001
educ_cat


    Primary or less
    Secondary 1.44 1.36, 1.53 <0.001
    Tertiary+ 2.41 2.18, 2.65 <0.001
income_quintile


    Q1 poorest
    Q2 1.14 1.05, 1.24 0.002
    Q3 1.17 1.08, 1.27 <0.001
    Q4 1.24 1.14, 1.35 <0.001
    Q5 richest 1.35 1.24, 1.46 <0.001
in_workforce 1.44 1.37, 1.52 <0.001
urban 0.92 0.88, 0.98 0.004
internet_use 1.44 1.35, 1.53 <0.001
mobile_phone_bin 1.47 1.35, 1.59 <0.001
dig_account 85.4 80.0, 91.1 <0.001
1 Odds ratios reported. Estimates are weighted using survey sampling weights. Reference categories used are primary education or less, poorest income quintile, out of workforce, rural residence, male respondents, non internet users, and non mobile phone users.
Abbreviations: CI = Confidence Interval, OR = Odds Ratio

Regression table notes: It’s interesting to observe the trends on internet use & mobil phone ownership. not so much on the digital account ownership, which makes sense because if you have a digital payment account, you’re likely to have done digital payment.

3.3.3 Regression patterns

Tables above report odds ratios from a survey-weighted logistic regression. Odds ratios greater than 1 indicate a positive association with digital payment use, while odds ratios below 1 indicate a negative association.

Relative to respondents with primary education or less, those with secondary and tertiary education exhibit substantially higher odds of digital payment use. Higher income quintiles and labour-force participation are also positively associated with digital payment use, while women show slightly lower odds than men, conditional on the included covariates.


4 General and Predicted Probability Visialisation

4.1 Plot 1: Weighted digital payment use by education

edu_plot_data <- svyby(formula =  ~digital_payment, 
                       by = ~educ_cat, 
                       design = design, 
                       svymean, 
                       na.rm = TRUE) %>%
  as_tibble()

edu_plot_data %>%
  ggplot(aes(x = educ_cat, y = digital_payment)) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
  labs (title = "Weighted Digital Payment Use by Education Level",
        x = NULL,
        y = "share using digital payments") +
  theme_classic()

4.1.1 Saving the figure

ggsave(here("output", "figures", "digital_payment_by_education.png"))  

4.2 Plot 2: Weighted digital payment use by income quintile

income_plot_data <- svyby(formula = ~digital_payment, 
                          by = ~income_quintile, 
                          design = design, 
                          svymean, 
                          na.rm = TRUE) %>%
  as_tibble()

income_plot_data %>%
  ggplot(aes(x = income_quintile, y = digital_payment)) +
  geom_col() +
  scale_y_continuous(labels = percent_format()) +
  labs (title = "Weighted Digital Payment Use by Income Quintile",
        x = "Income groups (in quintile)",
        y = "share using digital payments")+
  theme_classic()

4.2.1 Saving the figure

ggsave(here("output", "figures", "digital_payment_by_income_quintile.png"))

4.3 Plot 3: Predicted probabities by certain variables

To complement the odds-ratio table, predicted probabilities are calculated from the fitted model while holding other covariates constant. This helps translate regression coefficients into more intuitive quantities.

The question we’re trying to answer is: **“given the model, what is the predicted chance of using digital payments for different groups, while holding other variables fixed?”_** The primari model (m1) will be used in this analysis.

Finally, the basic workflow that will be use to generate the predicted probabilities is the following: - we will first fit the model, - then create new dataset, - then use ‘predict()’ function to plot the predicted probabilities, - after that, we attach the fitted probabilities, - and plot with ‘ggplot()’ function.

4.3.1 Plot 3a. predicted probabilities of digital payments by education level

comparing three education levels, for male respondents, at average age, in the middle income quintile, in the workforce, and living in an urban area.

# 1) pred_educ - create the dataset
pred_educ <- tibble(female_bin = 0,
                    age = mean(findex_analysis$age, na.rm = TRUE),
                    educ_cat = factor(c("Primary or less", "Secondary", "Tertiary+"),
                                      levels = levels(findex_analysis$educ_cat)),
                    income_quintile = factor("Q3", levels = levels(findex_analysis$income_quintile)),
                    in_workforce = 1,
                    urban = 1)

# 2) pred_prob - generate the predicted probabilities using predic() function
pred_prob <- predict(m1,
                     newdata = pred_educ,
                     type = "response")

# 3) plot the predicted probabilities
pred_educ %>%
  ggplot(aes(x = educ_cat, y = pred_prob)) +
  geom_col() +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  labs(title = "Predicted Probability of Digital Payment Use by Education Level",
       subtitle = "Predictions from survey-weighted logistic regression, holding other covariates constant",
       x = NULL,
       y = "predicted probability") +
  theme_classic()

Takeaway: The figure above shows model-predicted probabilities of digital payment use by education level, holding age, income quintile, workforce status, sex, and urban residence constant. The predicted probability rises sharply with education, indicating that educational attainment is strongly associated with digital financial inclusion.

4.3.1.1 Saving the figure

ggsave(here("output", "figures", "prediction_probabilities_by_education.png"))

4.3.2 Plot 3b: Predicted probabilities of digital payments by income level

comparing five income quintiles, for male respondents, at average age, with secondary education level, in the workforce, and living in an urban area.

# 1) create the dataset
pred_income <- tibble(female_bin = 0, 
                      age = mean(findex_analysis$age, na.rm = TRUE),
                      educ_cat = factor("Secondary", levels = levels(findex_analysis$educ_cat)),
                      income_quintile = factor(c("Q1 poorest", "Q2", "Q3", "Q4", "Q5 richest"),
                                          levels = levels(findex_analysis$income_quintile)),
                      in_workforce = 1,
                      urban = 1)

# 2) generate the predicted probabilities
pred_income$pred_prob <- predict(m1,
                                 newdata = pred_income,
                                 type = "response")
# 3) plot the predicted probabilities
pred_income %>%
  ggplot(aes(x = income_quintile, y = pred_prob)) +
  geom_col() +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  labs(title = "Predicted Probability of Digital Payment Use by Income Quintile",
       subtitle = "Predictions from survey-weighted logistic regression, holding other covariates constant",
       x = NULL,
       y = "predicted probability") +
  theme_classic()

Takeaway: The figure above shows model-predicted probabilities of digital payment use by income level, holding age, education level, workforce status, sex, and urban residence constant. The predicted probability rises with income, indicating that income is associated with digital financial inclusion.

4.3.2.1 Saving the figure

ggsave(here("output", "figures", "prediction_probabilities_by_education.png"))

We can do the same analysis & plotting for internet use and mobile phone ownership by adjusting the codes above. For the purpose of having a streamlined documentation, we will only limit the analysis for education level and income quintile.


5 Key findings and interpretation

This analysis finds clear socioeconomic gradients in digital payment use. Education is one of the strongest correlates, as respondents with higher educational attainment are much more likely to use digital payments. Income gradients are also substantial, both descriptively and in model-based predicted probabilities. In addition, the workforce participation and urban residence are positively associated with digital payment adoption, while women exhibit slightly lower odds than men after controlling for observed characteristics.

Overall, the results suggest that digital financial inclusion remains closely linked to educational attainment, labour-market attachment, and economic resources.


6 Limitations

This analysis is descriptive and associational rather than causal. The Global Findex is a cross-sectional survey, so the results should not be interpreted as causal effects of education or income on digital payment use.

In addition, the survey mode differs across contexts, with face-to-face interviews in many low- and middle-income economies and phone surveys in many high-income economies.

Finally, the outcome variable captures whether respondents made or received a digital payment in the past year, but does not measure frequency or intensity of use.


7 Reproducilibity notes

This report was produced in R Markdown from a single master script that performs data import, cleaning, variable construction, survey-weighted estimation, and figure generation. All outputs are generated programmatically from the raw microdata.


8 Reference List

World Bank. (2025). The Global Findex Database 2025: Connectivity and financial inclusion in the digital economy [Microdata]. World Bank Microdata Library. https://microdata.worldbank.org/catalog/7860/get-microdata

World Bank. (2025). The Global Findex Database 2025: Connectivity and financial inclusion in the digital economy (Study description). World Bank Microdata Library. https://microdata.worldbank.org/catalog/7860/study-description