Latent Class Analysis

Author

Richmond Silvanus Baye

Published

May 13, 2025

Project Introduction

Latent Class Analysis (LCA)

LCA Is a statistical method used to identify unobserved (latent) subgroups within a population based on patterns in categorical data. It is particularly useful when individuals exhibit different combinations of behaviors or responses that are not easily captured by observed variables alone.

USE CASES

  • In a business context, LCA helps uncover hidden customer segments (latent classes) based on shared attributes or behaviors. This enables more targeted marketing, risk management, or product design strategies.

  • In clinical or observational studies, LCA can reveal hidden subgroups based on symptom profiles, biomarker patterns, adherence behaviors, or side effects. For instance LCA can help group asthma patients based on self-reported symptom control, medication use, and health behaviors to identify phenotypes for targeted interventions.

  • In Patient Reported Outcomes (PROs) LCA can uncover distinct classes of patients based on how they report quality of life, functional status, or treatment satisfaction in PRO instruments.

In this tutorial, I apply LCA using the poLCA package to classify Nigerian households into distinct subgroups based on their access to and use of financial services, using binary indicators such as savings, access to credit, insurance, remittances and informal savings. This approach enables us to move beyond single-variable summaries and uncover meaningful patterns in financial behavior, which can inform inclusive financial policy, tailored interventions, and targeted outreach strategies to improve household welfare.

Let’s begin by loading the packages

pacman::p_load(ggplot2, gtsummary, poLCA, haven, tidyverse, dplyr, reshape2)

Loading and preparing the data

I will load the cleaned data from the LSMS dataset.

data <- read_dta("/Users/richmondsilvanusbaye/Documents/analysis/fin_data.dta")
set.seed(43)
names(data)
 [1] "hhid"             "wave"             "age"              "hhsize"          
 [5] "remittance"       "credit"           "savings_ass"      "savings_bank"    
 [9] "savings_coop"     "savings_informal" "savings_microf"   "insurance"       
[13] "zone"             "gender"           "education"        "location"        

Frequency table

Among the 4,568 households surveyed, access to and use of financial services varied considerably. Remittance usage was notably low, with only 1.8% of households reporting use. Credit access was somewhat more prevalent at 20%. Savings through associations and cooperation was rare (2.4%) and (4.2%) respectively, while savings in formal banking institutions was more common, with 30% of households participating. Informal savings emerged as the most widely used option, reported by 43% of respondents. In contrast, use of microfinance services and insurance remained very limited, at 1.8% and 2.6%, respectively.

data |>
  mutate(across(c(remittance, credit, savings_ass, savings_bank, savings_coop,
                  savings_informal, savings_microf, insurance),
                ~ factor(., levels = c(1, 2), labels = c("Yes", "No")))) |>
  select(remittance, credit, savings_ass, savings_bank, savings_coop,
         savings_informal, savings_microf, insurance) |>
  tbl_summary(
    type = all_categorical() ~ "categorical",
    statistic = all_categorical() ~ "{n} ({p}%)",
    missing = "no"
  )
Characteristic N = 4,5681
remittance
    Yes 83 (1.8%)
    No 4,485 (98%)
credit
    Yes 903 (20%)
    No 3,665 (80%)
savings_ass
    Yes 108 (2.4%)
    No 4,460 (98%)
savings_bank
    Yes 1,392 (30%)
    No 3,176 (70%)
savings_coop
    Yes 194 (4.2%)
    No 4,374 (96%)
savings_informal
    Yes 1,946 (43%)
    No 2,622 (57%)
savings_microf
    Yes 80 (1.8%)
    No 4,488 (98%)
insurance
    Yes 121 (2.6%)
    No 4,447 (97%)
1 n (%)

Selecting Financial Inclusion Columns For Clustering

Our dataset consist of a mix of demographic and financial access and use variables. Since to goal is cluster financial behavior and use, I selected the specific financial inclusion variables columns from the data.

f <- cbind(remittance, credit, savings_ass, savings_bank, savings_coop,
           savings_informal, savings_microf, insurance) ~ 1

After selecting the variables, we can now implement the latent class analysis

Latent Class Analysis

I will perform Latent Class Analysis (LCA) to determine the optimal number of classes.

I begin by implementing a series of latent class models ranging from 1 to 7 classes. For each specification, I then extract the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values to evaluate model fit. The model with the lowest BIC and the model with the lowest AIC are identified as the optimal solutions under each respective criterion.

# Initialize vectors to store BIC and AIC values
BIC_values <- numeric()
AIC_values <- numeric()
model_list <- list()

# Loop over number of classes from 1 to 10
for (n in 1:7) {
  # Suppress poLCA output while fitting the model
  lca_model <- suppressMessages(
    capture.output(
      poLCA(f, data, nclass = n, graphs = FALSE)
    )
  )
  
  # Evaluate model and store manually (outside capture)
  model <- poLCA(f, data, nclass = n, graphs = FALSE)
  model_list[[n]] <- model
  BIC_values[n] <- model$bic
  AIC_values[n] <- model$aic
}
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0182 0.9818

$credit
           Pr(1)  Pr(2)
class 1:  0.1977 0.8023

$savings_ass
           Pr(1)  Pr(2)
class 1:  0.0236 0.9764

$savings_bank
           Pr(1)  Pr(2)
class 1:  0.3047 0.6953

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.0425 0.9575

$savings_informal
          Pr(1) Pr(2)
class 1:  0.426 0.574

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.0175 0.9825

$insurance
           Pr(1)  Pr(2)
class 1:  0.0265 0.9735

Estimated class population shares 
 1 
 
Predicted class memberships (by modal posterior prob.) 
 1 
 
========================================================= 
Fit for 1 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 8 
residual degrees of freedom: 247 
maximum log-likelihood: -10885.97 
 
AIC(1): 21787.94
BIC(1): 21839.36
G^2(1): 1005.222 (Likelihood ratio/deviance statistic) 
X^2(1): 2437.475 (Chi-square goodness of fit) 
 
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0159 0.9841
class 2:  0.0215 0.9785

$credit
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.4917 0.5083

$savings_ass
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.0588 0.9412

$savings_bank
           Pr(1)  Pr(2)
class 1:  0.2505 0.7495
class 2:  0.3853 0.6147

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.1056 0.8944

$savings_informal
           Pr(1)  Pr(2)
class 1:  0.2649 0.7351
class 2:  0.6656 0.3344

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.0019 0.9981
class 2:  0.0407 0.9593

$insurance
           Pr(1)  Pr(2)
class 1:  0.0134 0.9866
class 2:  0.0460 0.9540

Estimated class population shares 
 0.5979 0.4021 
 
Predicted class memberships (by modal posterior prob.) 
 0.6824 0.3176 
 
========================================================= 
Fit for 2 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 17 
residual degrees of freedom: 238 
maximum log-likelihood: -10596.48 
 
AIC(2): 21226.96
BIC(2): 21336.21
G^2(2): 426.2338 (Likelihood ratio/deviance statistic) 
X^2(2): 702.9694 (Chi-square goodness of fit) 
 
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND 
 
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0039 0.9961
class 2:  0.0657 0.9343
class 3:  0.0129 0.9871

$credit
           Pr(1)  Pr(2)
class 1:  0.5914 0.4086
class 2:  0.3266 0.6734
class 3:  0.0000 1.0000

$savings_ass
          Pr(1) Pr(2)
class 1:  0.060 0.940
class 2:  0.054 0.946
class 3:  0.001 0.999

$savings_bank
           Pr(1)  Pr(2)
class 1:  0.1768 0.8232
class 2:  0.8656 0.1344
class 3:  0.2256 0.7744

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.0655 0.9345
class 2:  0.1801 0.8199
class 3:  0.0000 1.0000

$savings_informal
           Pr(1)  Pr(2)
class 1:  0.7347 0.2653
class 2:  0.4927 0.5073
class 3:  0.2792 0.7208

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.0175 0.9825
class 2:  0.0835 0.9165
class 3:  0.0018 0.9982

$insurance
           Pr(1)  Pr(2)
class 1:  0.0004 0.9996
class 2:  0.1676 0.8324
class 3:  0.0040 0.9960

Estimated class population shares 
 0.2552 0.1431 0.6017 
 
Predicted class memberships (by modal posterior prob.) 
 0.1721 0.0865 0.7415 
 
========================================================= 
Fit for 3 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 26 
residual degrees of freedom: 229 
maximum log-likelihood: -10473.25 
 
AIC(3): 20998.51
BIC(3): 21165.61
G^2(3): 179.7855 (Likelihood ratio/deviance statistic) 
X^2(3): 287.4108 (Chi-square goodness of fit) 
 
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0357 0.9643
class 2:  0.0091 0.9909
class 3:  0.0546 0.9454
class 4:  0.0043 0.9957

$credit
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.0211 0.9789
class 3:  0.5484 0.4516
class 4:  0.5696 0.4304

$savings_ass
           Pr(1)  Pr(2)
class 1:  0.0176 0.9824
class 2:  0.0000 1.0000
class 3:  0.0462 0.9538
class 4:  0.0637 0.9363

$savings_bank
           Pr(1)  Pr(2)
class 1:  0.8150 0.1850
class 2:  0.0008 0.9992
class 3:  0.8216 0.1784
class 4:  0.1817 0.8183

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.0205 0.9795
class 2:  0.0000 1.0000
class 3:  0.2519 0.7481
class 4:  0.0597 0.9403

$savings_informal
           Pr(1)  Pr(2)
class 1:  0.3096 0.6904
class 2:  0.2720 0.7280
class 3:  0.5606 0.4394
class 4:  0.7640 0.2360

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.0103 0.9897
class 2:  0.0020 0.9980
class 3:  0.1166 0.8834
class 4:  0.0144 0.9856

$insurance
           Pr(1)  Pr(2)
class 1:  0.0561 0.9439
class 2:  0.0000 1.0000
class 3:  0.1479 0.8521
class 4:  0.0002 0.9998

Estimated class population shares 
 0.226 0.4398 0.0929 0.2412 
 
Predicted class memberships (by modal posterior prob.) 
 0.225 0.5462 0.0567 0.1721 
 
========================================================= 
Fit for 4 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 35 
residual degrees of freedom: 220 
maximum log-likelihood: -10456.36 
 
AIC(4): 20982.73
BIC(4): 21207.66
G^2(4): 146.003 (Likelihood ratio/deviance statistic) 
X^2(4): 197.5178 (Chi-square goodness of fit) 
 
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND 
 
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0456 0.9544
class 2:  0.0000 1.0000
class 3:  0.0316 0.9684
class 4:  0.0096 0.9904
class 5:  0.0612 0.9388

$credit
           Pr(1)  Pr(2)
class 1:  0.5346 0.4654
class 2:  0.9621 0.0379
class 3:  0.2220 0.7780
class 4:  0.0000 1.0000
class 5:  0.0000 1.0000

$savings_ass
           Pr(1)  Pr(2)
class 1:  0.0356 0.9644
class 2:  0.0537 0.9463
class 3:  0.0964 0.9036
class 4:  0.0040 0.9960
class 5:  0.0156 0.9844

$savings_bank
           Pr(1)  Pr(2)
class 1:  0.8330 0.1670
class 2:  0.0655 0.9345
class 3:  0.4860 0.5140
class 4:  0.1451 0.8549
class 5:  1.0000 0.0000

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.2166 0.7834
class 2:  0.0512 0.9488
class 3:  0.0744 0.9256
class 4:  0.0044 0.9956
class 5:  0.0414 0.9586

$savings_informal
           Pr(1)  Pr(2)
class 1:  0.5470 0.4530
class 2:  0.6959 0.3041
class 3:  0.9996 0.0004
class 4:  0.2955 0.7045
class 5:  0.0628 0.9372

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.1166 0.8834
class 2:  0.0206 0.9794
class 3:  0.0000 1.0000
class 4:  0.0035 0.9965
class 5:  0.0072 0.9928

$insurance
           Pr(1)  Pr(2)
class 1:  0.1331 0.8669
class 2:  0.0000 1.0000
class 3:  0.0134 0.9866
class 4:  0.0037 0.9963
class 5:  0.1214 0.8786

Estimated class population shares 
 0.1063 0.123 0.1013 0.5972 0.0722 
 
Predicted class memberships (by modal posterior prob.) 
 0.0893 0.1292 0.09 0.6756 0.016 
 
========================================================= 
Fit for 5 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 44 
residual degrees of freedom: 211 
maximum log-likelihood: -10445.38 
 
AIC(5): 20978.76
BIC(5): 21261.54
G^2(5): 124.0342 (Likelihood ratio/deviance statistic) 
X^2(5): 194.8473 (Chi-square goodness of fit) 
 
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND 
 
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0686 0.9314
class 2:  0.0091 0.9909
class 3:  0.0621 0.9379
class 4:  0.0029 0.9971
class 5:  0.0104 0.9896
class 6:  0.0337 0.9663

$credit
           Pr(1)  Pr(2)
class 1:  0.4712 0.5288
class 2:  0.0000 1.0000
class 3:  0.0000 1.0000
class 4:  0.9933 0.0067
class 5:  0.6104 0.3896
class 6:  0.1035 0.8965

$savings_ass
           Pr(1)  Pr(2)
class 1:  0.0212 0.9788
class 2:  0.0006 0.9994
class 3:  0.0000 1.0000
class 4:  0.0535 0.9465
class 5:  0.0599 0.9401
class 6:  0.1255 0.8745

$savings_bank
           Pr(1)  Pr(2)
class 1:  0.8607 0.1393
class 2:  0.1468 0.8532
class 3:  1.0000 0.0000
class 4:  0.1428 0.8572
class 5:  0.5277 0.4723
class 6:  0.5339 0.4661

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.1068 0.8932
class 2:  0.0027 0.9973
class 3:  0.0645 0.9355
class 4:  0.0000 1.0000
class 5:  0.6478 0.3522
class 6:  0.0000 1.0000

$savings_informal
           Pr(1)  Pr(2)
class 1:  0.5200 0.4800
class 2:  0.3114 0.6886
class 3:  0.0978 0.9022
class 4:  0.7107 0.2893
class 5:  0.7573 0.2427
class 6:  0.7489 0.2511

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.1883 0.8117
class 2:  0.0033 0.9967
class 3:  0.0000 1.0000
class 4:  0.0220 0.9780
class 5:  0.0000 1.0000
class 6:  0.0000 1.0000

$insurance
           Pr(1)  Pr(2)
class 1:  0.1541 0.8459
class 2:  0.0044 0.9956
class 3:  0.1444 0.8556
class 4:  0.0000 1.0000
class 5:  0.0899 0.9101
class 6:  0.0005 0.9995

Estimated class population shares 
 0.0674 0.596 0.0648 0.1289 0.0455 0.0974 
 
Predicted class memberships (by modal posterior prob.) 
 0.0346 0.6718 0.0208 0.1517 0.0348 0.0863 
 
========================================================= 
Fit for 6 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 53 
residual degrees of freedom: 202 
maximum log-likelihood: -10436.08 
 
AIC(6): 20978.16
BIC(6): 21318.78
G^2(6): 105.4347 (Likelihood ratio/deviance statistic) 
X^2(6): 218.9361 (Chi-square goodness of fit) 
 
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND 
 
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0633 0.9367
class 2:  0.0038 0.9962
class 3:  0.0581 0.9419
class 4:  0.0174 0.9826
class 5:  0.0037 0.9963
class 6:  0.0905 0.9095
class 7:  0.0497 0.9503

$credit
           Pr(1)  Pr(2)
class 1:  0.0585 0.9415
class 2:  0.5647 0.4353
class 3:  0.5053 0.4947
class 4:  0.0000 1.0000
class 5:  0.0031 0.9969
class 6:  0.0000 1.0000
class 7:  0.6088 0.3912

$savings_ass
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.0574 0.9426
class 3:  0.0000 1.0000
class 4:  0.0010 0.9990
class 5:  0.0000 1.0000
class 6:  0.4184 0.5816
class 7:  0.0522 0.9478

$savings_bank
           Pr(1)  Pr(2)
class 1:  1.0000 0.0000
class 2:  0.1767 0.8233
class 3:  0.7306 0.2694
class 4:  0.2540 0.7460
class 5:  0.0671 0.9329
class 6:  1.0000 0.0000
class 7:  0.8061 0.1939

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.0638 0.9362
class 2:  0.0500 0.9500
class 3:  0.0696 0.9304
class 4:  0.0031 0.9969
class 5:  0.0000 1.0000
class 6:  0.0000 1.0000
class 7:  0.3616 0.6384

$savings_informal
           Pr(1)  Pr(2)
class 1:  0.1418 0.8582
class 2:  0.7236 0.2764
class 3:  0.4342 0.5658
class 4:  0.4299 0.5701
class 5:  0.0307 0.9693
class 6:  0.7691 0.2309
class 7:  0.6740 0.3260

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.0000 1.0000
class 2:  0.0092 0.9908
class 3:  0.9999 0.0001
class 4:  0.0000 1.0000
class 5:  0.0028 0.9972
class 6:  0.0376 0.9624
class 7:  0.0000 1.0000

$insurance
           Pr(1)  Pr(2)
class 1:  0.1458 0.8542
class 2:  0.0000 1.0000
class 3:  0.1241 0.8759
class 4:  0.0093 0.9907
class 5:  0.0000 1.0000
class 6:  0.0000 1.0000
class 7:  0.1613 0.8387

Estimated class population shares 
 0.0774 0.262 0.0141 0.3784 0.1946 0.012 0.0615 
 
Predicted class memberships (by modal posterior prob.) 
 0.0239 0.1845 0.0153 0.3897 0.354 0.0068 0.0258 
 
========================================================= 
Fit for 7 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 62 
residual degrees of freedom: 193 
maximum log-likelihood: -10432.27 
 
AIC(7): 20988.54
BIC(7): 21387
G^2(7): 97.81382 (Likelihood ratio/deviance statistic) 
X^2(7): 170.8666 (Chi-square goodness of fit) 
 
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND 
 
# Identify the best number of classes
best_BIC_nclass <- which.min(BIC_values)
best_AIC_nclass <- which.min(AIC_values)

# Print summary of model selection
cat("The best model based on BIC is with nclass =", best_BIC_nclass,
    "with BIC =", BIC_values[best_BIC_nclass], "\n")
The best model based on BIC is with nclass = 3 with BIC = 21165.61 
cat("The best model based on AIC is with nclass =", best_AIC_nclass,
    "with AIC =", AIC_values[best_AIC_nclass], "\n")
The best model based on AIC is with nclass = 6 with AIC = 20978.16 

As shown the best model based on BIC and AIC is 3 and 6 respectively. However, for the purposes of parsimony, I will go with the BIC. This is because unlike the AIC, the BIC imposes a stronger penalty for model complexity and it favors a more parsimonious solution. With this in mind, I can proceed to fit the optimal model to identify the latent clusters.

Fitting the Optimal Model

Fiting the LCA model with 3 classes. I don’t like the aesthetics of the default plot. So I set that to FALSE. Feel free to set that to true

model <- poLCA(f, data, nclass = 3, graphs = FALSE)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0039 0.9961
class 2:  0.0129 0.9871
class 3:  0.0657 0.9343

$credit
           Pr(1)  Pr(2)
class 1:  0.5914 0.4086
class 2:  0.0000 1.0000
class 3:  0.3266 0.6734

$savings_ass
          Pr(1) Pr(2)
class 1:  0.060 0.940
class 2:  0.001 0.999
class 3:  0.054 0.946

$savings_bank
           Pr(1)  Pr(2)
class 1:  0.1768 0.8232
class 2:  0.2256 0.7744
class 3:  0.8656 0.1344

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.0655 0.9345
class 2:  0.0000 1.0000
class 3:  0.1801 0.8199

$savings_informal
           Pr(1)  Pr(2)
class 1:  0.7347 0.2653
class 2:  0.2792 0.7208
class 3:  0.4927 0.5073

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.0175 0.9825
class 2:  0.0018 0.9982
class 3:  0.0835 0.9165

$insurance
           Pr(1)  Pr(2)
class 1:  0.0004 0.9996
class 2:  0.0040 0.9960
class 3:  0.1676 0.8324

Estimated class population shares 
 0.2552 0.6017 0.1431 
 
Predicted class memberships (by modal posterior prob.) 
 0.1721 0.7415 0.0865 
 
========================================================= 
Fit for 3 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 26 
residual degrees of freedom: 229 
maximum log-likelihood: -10473.25 
 
AIC(3): 20998.51
BIC(3): 21165.61
G^2(3): 179.7855 (Likelihood ratio/deviance statistic) 
X^2(3): 287.4135 (Chi-square goodness of fit) 
 

The estimated class population shares is 26%, 60% and 14% . However, this class size does not tell us anything about what their behaviors are.

class_probs <- model$P  # This gives the probability of membership in each class

# Create a data frame for plotting
class_df <- data.frame(
  Class = 1:length(class_probs),
  Probability = class_probs
)
class_df
  Class Probability
1     1   0.2552393
2     2   0.6016953
3     3   0.1430655

Plotting Class Membership Probabilities

To visualize the class sizes, we use the data frame class_df to plot it.

ggplot(class_df, aes(x = factor(Class), y = Probability)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  theme_minimal() +
  labs(x = "Latent Class", y = "Membership Probability",
       title = "Class Membership Probabilities")

Predicting Class Membership

We predict the class membership for each household.

predicted_class <- model$predclass
data$class_membership <- predicted_class
head(data)
# A tibble: 6 × 17
   hhid  wave   age hhsize remittance credit    savings_ass savings_bank
  <dbl> <dbl> <dbl>  <dbl> <dbl+lbl>  <dbl+lbl> <dbl+lbl>   <dbl+lbl>   
1 10001     1    50      7 2 [No]     2 [No]    2 [No]      1 [Yes]     
2 10001     2    53      8 2 [No]     2 [No]    2 [No]      1 [Yes]     
3 10001     3    52      6 2 [No]     2 [No]    2 [No]      1 [Yes]     
4 10001     4    55      6 2 [No]     2 [No]    2 [No]      1 [Yes]     
5 10002     1    44      7 2 [No]     2 [No]    2 [No]      1 [Yes]     
6 10002     2    46      8 2 [No]     2 [No]    2 [No]      1 [Yes]     
# ℹ 9 more variables: savings_coop <dbl+lbl>, savings_informal <dbl+lbl>,
#   savings_microf <dbl+lbl>, insurance <dbl+lbl>, zone <dbl+lbl>,
#   gender <dbl+lbl>, education <dbl+lbl>, location <dbl+lbl>,
#   class_membership <int>

To determine which latent class represents low, moderate, or high financial inclusion (FI), you need to examine the conditional response probabilities (i.e., the probabilities of answering “Yes” to each financial access or use variable within each class. To do that, we need to extract the conditional probabilities of belonging to each class.

Extracting Conditional Probabilities

We extract the conditional probabilities for all variables.

model <- poLCA(f, data, nclass = 3, graphs = FALSE)
Conditional item response (column) probabilities,
 by outcome variable, for each class (row) 
 
$remittance
           Pr(1)  Pr(2)
class 1:  0.0039 0.9961
class 2:  0.0657 0.9343
class 3:  0.0129 0.9871

$credit
           Pr(1)  Pr(2)
class 1:  0.5914 0.4086
class 2:  0.3266 0.6734
class 3:  0.0000 1.0000

$savings_ass
          Pr(1) Pr(2)
class 1:  0.060 0.940
class 2:  0.054 0.946
class 3:  0.001 0.999

$savings_bank
           Pr(1)  Pr(2)
class 1:  0.1768 0.8232
class 2:  0.8656 0.1344
class 3:  0.2256 0.7744

$savings_coop
           Pr(1)  Pr(2)
class 1:  0.0655 0.9345
class 2:  0.1801 0.8199
class 3:  0.0000 1.0000

$savings_informal
           Pr(1)  Pr(2)
class 1:  0.7347 0.2653
class 2:  0.4927 0.5073
class 3:  0.2792 0.7208

$savings_microf
           Pr(1)  Pr(2)
class 1:  0.0175 0.9825
class 2:  0.0835 0.9165
class 3:  0.0018 0.9982

$insurance
           Pr(1)  Pr(2)
class 1:  0.0004 0.9996
class 2:  0.1676 0.8324
class 3:  0.0040 0.9960

Estimated class population shares 
 0.2552 0.1431 0.6017 
 
Predicted class memberships (by modal posterior prob.) 
 0.1721 0.0865 0.7415 
 
========================================================= 
Fit for 3 latent classes: 
========================================================= 
number of observations: 4568 
number of estimated parameters: 26 
residual degrees of freedom: 229 
maximum log-likelihood: -10473.25 
 
AIC(3): 20998.51
BIC(3): 21165.61
G^2(3): 179.7855 (Likelihood ratio/deviance statistic) 
X^2(3): 287.4104 (Chi-square goodness of fit) 
 
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND 
 
cond_probs <- model$probs

Creating a Combined Data Frame for Plotting

Having extracted the conditional probabilities, we can now create a data frame and this will allow us to visualize the the probabilities of answering “Yes” to each financial access or use question in the survey within each class.

melted_data <- list()

for (var_name in names(cond_probs)) {
  var_probs <- as.data.frame(cond_probs[[var_name]])
  var_probs$Class <- factor(1:nrow(var_probs))

  var_probs_melt <- melt(var_probs, id.vars = "Class")
  var_probs_melt$Variable <- var_name

  melted_data[[var_name]] <- var_probs_melt
}

plot_data <- do.call(rbind, melted_data)
plot_data$variable <- factor(plot_data$variable, levels = c("Pr(1)", "Pr(2)"), labels = c("No", "Yes"))

plot_data$Variable <- factor(plot_data$Variable, levels = c("credit", "insurance", "remittance", "savings_ass",
                                                            "savings_bank", "savings_coop", "savings_informal",
                                                            "savings_microf"),
                             labels = c("Credits", "Insurance", "Remittance", "Savings association",
                                        "Savings bank", "Savings cooperation", "Informal savings",
                                        "Microfinance savings"))

Plotting Conditional Probabilities by Latent Class and Variable

ggplot(plot_data, aes(x = Class, y = value, fill = variable)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ Variable, scales = "free_y") +
  theme_minimal() +
  labs(x = "Latent Class", y = "Conditional Probability",
       title = "Conditional Probabilities by Latent Class and Variable") +
  scale_fill_manual(values = c("No" = "white", "Yes" = "orange")) +
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 0, hjust = 1),
        legend.position = "none")

Discussion

The analysis identified three distinct financial behavior profiles among Nigerian households based on their access to and use of eight financial services.

  • Class 3 represents households with high financial inclusion, characterized by consistently high probabilities of access and usage across all financial services.

  • Class 2 reflects households with limited financial engagement, marked by low participation in a broad range of financial activities. This includes minimal use of microfinance savings, limited access to remittances and insurance, low ownership of bank accounts, and reduced involvement in cooperative and association-based savings mechanisms.

  • Class 1 indicates moderate financial engagement, with relatively high usage of remittances, insurance, cooperative savings, and microfinance services. However, this group shows lower participation in credit access, informal savings, and savings through associations.

These findings suggest meaningful behavioral and access-related differences across the population.

Conclusion and Future Use Cases

This analysis demonstrates how Latent Class Analysis (LCA) can effectively uncover hidden patterns in household financial behaviors, enabling more tailored financial inclusion strategies.

In business contexts, such segmentation can inform targeted financial product design, credit risk scoring, or customer outreach strategies. In the development sector, LCA can support program targeting for underserved groups. In healthcare, LCA can uncover behavioral subgroups for adherence, symptom management, or response to health financing programs. Finally, LCA offers a powerful tool for analyzing patient-reported outcomes, enabling more personalized interventions and resource allocation.

Future work may link class membership to household welfare indicators or simulate how shifts in service access could transition households across inclusion tiers.