pacman::p_load(ggplot2, gtsummary, poLCA, haven, tidyverse, dplyr, reshape2)Latent Class Analysis
Project Introduction
Latent Class Analysis (LCA)
LCA Is a statistical method used to identify unobserved (latent) subgroups within a population based on patterns in categorical data. It is particularly useful when individuals exhibit different combinations of behaviors or responses that are not easily captured by observed variables alone.
USE CASES
In a business context, LCA helps uncover hidden customer segments (latent classes) based on shared attributes or behaviors. This enables more targeted marketing, risk management, or product design strategies.
In clinical or observational studies, LCA can reveal hidden subgroups based on symptom profiles, biomarker patterns, adherence behaviors, or side effects. For instance LCA can help group asthma patients based on self-reported symptom control, medication use, and health behaviors to identify phenotypes for targeted interventions.
In Patient Reported Outcomes (PROs) LCA can uncover distinct classes of patients based on how they report quality of life, functional status, or treatment satisfaction in PRO instruments.
In this tutorial, I apply LCA using the poLCA package to classify Nigerian households into distinct subgroups based on their access to and use of financial services, using binary indicators such as savings, access to credit, insurance, remittances and informal savings. This approach enables us to move beyond single-variable summaries and uncover meaningful patterns in financial behavior, which can inform inclusive financial policy, tailored interventions, and targeted outreach strategies to improve household welfare.
Let’s begin by loading the packages
Loading and preparing the data
I will load the cleaned data from the LSMS dataset.
data <- read_dta("/Users/richmondsilvanusbaye/Documents/analysis/fin_data.dta")
set.seed(43)
names(data) [1] "hhid" "wave" "age" "hhsize"
[5] "remittance" "credit" "savings_ass" "savings_bank"
[9] "savings_coop" "savings_informal" "savings_microf" "insurance"
[13] "zone" "gender" "education" "location"
Frequency table
Among the 4,568 households surveyed, access to and use of financial services varied considerably. Remittance usage was notably low, with only 1.8% of households reporting use. Credit access was somewhat more prevalent at 20%. Savings through associations and cooperation was rare (2.4%) and (4.2%) respectively, while savings in formal banking institutions was more common, with 30% of households participating. Informal savings emerged as the most widely used option, reported by 43% of respondents. In contrast, use of microfinance services and insurance remained very limited, at 1.8% and 2.6%, respectively.
data |>
mutate(across(c(remittance, credit, savings_ass, savings_bank, savings_coop,
savings_informal, savings_microf, insurance),
~ factor(., levels = c(1, 2), labels = c("Yes", "No")))) |>
select(remittance, credit, savings_ass, savings_bank, savings_coop,
savings_informal, savings_microf, insurance) |>
tbl_summary(
type = all_categorical() ~ "categorical",
statistic = all_categorical() ~ "{n} ({p}%)",
missing = "no"
)| Characteristic | N = 4,5681 |
|---|---|
| remittance | |
| Yes | 83 (1.8%) |
| No | 4,485 (98%) |
| credit | |
| Yes | 903 (20%) |
| No | 3,665 (80%) |
| savings_ass | |
| Yes | 108 (2.4%) |
| No | 4,460 (98%) |
| savings_bank | |
| Yes | 1,392 (30%) |
| No | 3,176 (70%) |
| savings_coop | |
| Yes | 194 (4.2%) |
| No | 4,374 (96%) |
| savings_informal | |
| Yes | 1,946 (43%) |
| No | 2,622 (57%) |
| savings_microf | |
| Yes | 80 (1.8%) |
| No | 4,488 (98%) |
| insurance | |
| Yes | 121 (2.6%) |
| No | 4,447 (97%) |
| 1 n (%) | |
Selecting Financial Inclusion Columns For Clustering
Our dataset consist of a mix of demographic and financial access and use variables. Since to goal is cluster financial behavior and use, I selected the specific financial inclusion variables columns from the data.
f <- cbind(remittance, credit, savings_ass, savings_bank, savings_coop,
savings_informal, savings_microf, insurance) ~ 1After selecting the variables, we can now implement the latent class analysis
Latent Class Analysis
I will perform Latent Class Analysis (LCA) to determine the optimal number of classes.
I begin by implementing a series of latent class models ranging from 1 to 7 classes. For each specification, I then extract the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values to evaluate model fit. The model with the lowest BIC and the model with the lowest AIC are identified as the optimal solutions under each respective criterion.
# Initialize vectors to store BIC and AIC values
BIC_values <- numeric()
AIC_values <- numeric()
model_list <- list()
# Loop over number of classes from 1 to 10
for (n in 1:7) {
# Suppress poLCA output while fitting the model
lca_model <- suppressMessages(
capture.output(
poLCA(f, data, nclass = n, graphs = FALSE)
)
)
# Evaluate model and store manually (outside capture)
model <- poLCA(f, data, nclass = n, graphs = FALSE)
model_list[[n]] <- model
BIC_values[n] <- model$bic
AIC_values[n] <- model$aic
}Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0182 0.9818
$credit
Pr(1) Pr(2)
class 1: 0.1977 0.8023
$savings_ass
Pr(1) Pr(2)
class 1: 0.0236 0.9764
$savings_bank
Pr(1) Pr(2)
class 1: 0.3047 0.6953
$savings_coop
Pr(1) Pr(2)
class 1: 0.0425 0.9575
$savings_informal
Pr(1) Pr(2)
class 1: 0.426 0.574
$savings_microf
Pr(1) Pr(2)
class 1: 0.0175 0.9825
$insurance
Pr(1) Pr(2)
class 1: 0.0265 0.9735
Estimated class population shares
1
Predicted class memberships (by modal posterior prob.)
1
=========================================================
Fit for 1 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 8
residual degrees of freedom: 247
maximum log-likelihood: -10885.97
AIC(1): 21787.94
BIC(1): 21839.36
G^2(1): 1005.222 (Likelihood ratio/deviance statistic)
X^2(1): 2437.475 (Chi-square goodness of fit)
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0159 0.9841
class 2: 0.0215 0.9785
$credit
Pr(1) Pr(2)
class 1: 0.0000 1.0000
class 2: 0.4917 0.5083
$savings_ass
Pr(1) Pr(2)
class 1: 0.0000 1.0000
class 2: 0.0588 0.9412
$savings_bank
Pr(1) Pr(2)
class 1: 0.2505 0.7495
class 2: 0.3853 0.6147
$savings_coop
Pr(1) Pr(2)
class 1: 0.0000 1.0000
class 2: 0.1056 0.8944
$savings_informal
Pr(1) Pr(2)
class 1: 0.2649 0.7351
class 2: 0.6656 0.3344
$savings_microf
Pr(1) Pr(2)
class 1: 0.0019 0.9981
class 2: 0.0407 0.9593
$insurance
Pr(1) Pr(2)
class 1: 0.0134 0.9866
class 2: 0.0460 0.9540
Estimated class population shares
0.5979 0.4021
Predicted class memberships (by modal posterior prob.)
0.6824 0.3176
=========================================================
Fit for 2 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 17
residual degrees of freedom: 238
maximum log-likelihood: -10596.48
AIC(2): 21226.96
BIC(2): 21336.21
G^2(2): 426.2338 (Likelihood ratio/deviance statistic)
X^2(2): 702.9694 (Chi-square goodness of fit)
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0039 0.9961
class 2: 0.0657 0.9343
class 3: 0.0129 0.9871
$credit
Pr(1) Pr(2)
class 1: 0.5914 0.4086
class 2: 0.3266 0.6734
class 3: 0.0000 1.0000
$savings_ass
Pr(1) Pr(2)
class 1: 0.060 0.940
class 2: 0.054 0.946
class 3: 0.001 0.999
$savings_bank
Pr(1) Pr(2)
class 1: 0.1768 0.8232
class 2: 0.8656 0.1344
class 3: 0.2256 0.7744
$savings_coop
Pr(1) Pr(2)
class 1: 0.0655 0.9345
class 2: 0.1801 0.8199
class 3: 0.0000 1.0000
$savings_informal
Pr(1) Pr(2)
class 1: 0.7347 0.2653
class 2: 0.4927 0.5073
class 3: 0.2792 0.7208
$savings_microf
Pr(1) Pr(2)
class 1: 0.0175 0.9825
class 2: 0.0835 0.9165
class 3: 0.0018 0.9982
$insurance
Pr(1) Pr(2)
class 1: 0.0004 0.9996
class 2: 0.1676 0.8324
class 3: 0.0040 0.9960
Estimated class population shares
0.2552 0.1431 0.6017
Predicted class memberships (by modal posterior prob.)
0.1721 0.0865 0.7415
=========================================================
Fit for 3 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 26
residual degrees of freedom: 229
maximum log-likelihood: -10473.25
AIC(3): 20998.51
BIC(3): 21165.61
G^2(3): 179.7855 (Likelihood ratio/deviance statistic)
X^2(3): 287.4108 (Chi-square goodness of fit)
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0357 0.9643
class 2: 0.0091 0.9909
class 3: 0.0546 0.9454
class 4: 0.0043 0.9957
$credit
Pr(1) Pr(2)
class 1: 0.0000 1.0000
class 2: 0.0211 0.9789
class 3: 0.5484 0.4516
class 4: 0.5696 0.4304
$savings_ass
Pr(1) Pr(2)
class 1: 0.0176 0.9824
class 2: 0.0000 1.0000
class 3: 0.0462 0.9538
class 4: 0.0637 0.9363
$savings_bank
Pr(1) Pr(2)
class 1: 0.8150 0.1850
class 2: 0.0008 0.9992
class 3: 0.8216 0.1784
class 4: 0.1817 0.8183
$savings_coop
Pr(1) Pr(2)
class 1: 0.0205 0.9795
class 2: 0.0000 1.0000
class 3: 0.2519 0.7481
class 4: 0.0597 0.9403
$savings_informal
Pr(1) Pr(2)
class 1: 0.3096 0.6904
class 2: 0.2720 0.7280
class 3: 0.5606 0.4394
class 4: 0.7640 0.2360
$savings_microf
Pr(1) Pr(2)
class 1: 0.0103 0.9897
class 2: 0.0020 0.9980
class 3: 0.1166 0.8834
class 4: 0.0144 0.9856
$insurance
Pr(1) Pr(2)
class 1: 0.0561 0.9439
class 2: 0.0000 1.0000
class 3: 0.1479 0.8521
class 4: 0.0002 0.9998
Estimated class population shares
0.226 0.4398 0.0929 0.2412
Predicted class memberships (by modal posterior prob.)
0.225 0.5462 0.0567 0.1721
=========================================================
Fit for 4 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 35
residual degrees of freedom: 220
maximum log-likelihood: -10456.36
AIC(4): 20982.73
BIC(4): 21207.66
G^2(4): 146.003 (Likelihood ratio/deviance statistic)
X^2(4): 197.5178 (Chi-square goodness of fit)
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0456 0.9544
class 2: 0.0000 1.0000
class 3: 0.0316 0.9684
class 4: 0.0096 0.9904
class 5: 0.0612 0.9388
$credit
Pr(1) Pr(2)
class 1: 0.5346 0.4654
class 2: 0.9621 0.0379
class 3: 0.2220 0.7780
class 4: 0.0000 1.0000
class 5: 0.0000 1.0000
$savings_ass
Pr(1) Pr(2)
class 1: 0.0356 0.9644
class 2: 0.0537 0.9463
class 3: 0.0964 0.9036
class 4: 0.0040 0.9960
class 5: 0.0156 0.9844
$savings_bank
Pr(1) Pr(2)
class 1: 0.8330 0.1670
class 2: 0.0655 0.9345
class 3: 0.4860 0.5140
class 4: 0.1451 0.8549
class 5: 1.0000 0.0000
$savings_coop
Pr(1) Pr(2)
class 1: 0.2166 0.7834
class 2: 0.0512 0.9488
class 3: 0.0744 0.9256
class 4: 0.0044 0.9956
class 5: 0.0414 0.9586
$savings_informal
Pr(1) Pr(2)
class 1: 0.5470 0.4530
class 2: 0.6959 0.3041
class 3: 0.9996 0.0004
class 4: 0.2955 0.7045
class 5: 0.0628 0.9372
$savings_microf
Pr(1) Pr(2)
class 1: 0.1166 0.8834
class 2: 0.0206 0.9794
class 3: 0.0000 1.0000
class 4: 0.0035 0.9965
class 5: 0.0072 0.9928
$insurance
Pr(1) Pr(2)
class 1: 0.1331 0.8669
class 2: 0.0000 1.0000
class 3: 0.0134 0.9866
class 4: 0.0037 0.9963
class 5: 0.1214 0.8786
Estimated class population shares
0.1063 0.123 0.1013 0.5972 0.0722
Predicted class memberships (by modal posterior prob.)
0.0893 0.1292 0.09 0.6756 0.016
=========================================================
Fit for 5 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 44
residual degrees of freedom: 211
maximum log-likelihood: -10445.38
AIC(5): 20978.76
BIC(5): 21261.54
G^2(5): 124.0342 (Likelihood ratio/deviance statistic)
X^2(5): 194.8473 (Chi-square goodness of fit)
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0686 0.9314
class 2: 0.0091 0.9909
class 3: 0.0621 0.9379
class 4: 0.0029 0.9971
class 5: 0.0104 0.9896
class 6: 0.0337 0.9663
$credit
Pr(1) Pr(2)
class 1: 0.4712 0.5288
class 2: 0.0000 1.0000
class 3: 0.0000 1.0000
class 4: 0.9933 0.0067
class 5: 0.6104 0.3896
class 6: 0.1035 0.8965
$savings_ass
Pr(1) Pr(2)
class 1: 0.0212 0.9788
class 2: 0.0006 0.9994
class 3: 0.0000 1.0000
class 4: 0.0535 0.9465
class 5: 0.0599 0.9401
class 6: 0.1255 0.8745
$savings_bank
Pr(1) Pr(2)
class 1: 0.8607 0.1393
class 2: 0.1468 0.8532
class 3: 1.0000 0.0000
class 4: 0.1428 0.8572
class 5: 0.5277 0.4723
class 6: 0.5339 0.4661
$savings_coop
Pr(1) Pr(2)
class 1: 0.1068 0.8932
class 2: 0.0027 0.9973
class 3: 0.0645 0.9355
class 4: 0.0000 1.0000
class 5: 0.6478 0.3522
class 6: 0.0000 1.0000
$savings_informal
Pr(1) Pr(2)
class 1: 0.5200 0.4800
class 2: 0.3114 0.6886
class 3: 0.0978 0.9022
class 4: 0.7107 0.2893
class 5: 0.7573 0.2427
class 6: 0.7489 0.2511
$savings_microf
Pr(1) Pr(2)
class 1: 0.1883 0.8117
class 2: 0.0033 0.9967
class 3: 0.0000 1.0000
class 4: 0.0220 0.9780
class 5: 0.0000 1.0000
class 6: 0.0000 1.0000
$insurance
Pr(1) Pr(2)
class 1: 0.1541 0.8459
class 2: 0.0044 0.9956
class 3: 0.1444 0.8556
class 4: 0.0000 1.0000
class 5: 0.0899 0.9101
class 6: 0.0005 0.9995
Estimated class population shares
0.0674 0.596 0.0648 0.1289 0.0455 0.0974
Predicted class memberships (by modal posterior prob.)
0.0346 0.6718 0.0208 0.1517 0.0348 0.0863
=========================================================
Fit for 6 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 53
residual degrees of freedom: 202
maximum log-likelihood: -10436.08
AIC(6): 20978.16
BIC(6): 21318.78
G^2(6): 105.4347 (Likelihood ratio/deviance statistic)
X^2(6): 218.9361 (Chi-square goodness of fit)
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND
Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0633 0.9367
class 2: 0.0038 0.9962
class 3: 0.0581 0.9419
class 4: 0.0174 0.9826
class 5: 0.0037 0.9963
class 6: 0.0905 0.9095
class 7: 0.0497 0.9503
$credit
Pr(1) Pr(2)
class 1: 0.0585 0.9415
class 2: 0.5647 0.4353
class 3: 0.5053 0.4947
class 4: 0.0000 1.0000
class 5: 0.0031 0.9969
class 6: 0.0000 1.0000
class 7: 0.6088 0.3912
$savings_ass
Pr(1) Pr(2)
class 1: 0.0000 1.0000
class 2: 0.0574 0.9426
class 3: 0.0000 1.0000
class 4: 0.0010 0.9990
class 5: 0.0000 1.0000
class 6: 0.4184 0.5816
class 7: 0.0522 0.9478
$savings_bank
Pr(1) Pr(2)
class 1: 1.0000 0.0000
class 2: 0.1767 0.8233
class 3: 0.7306 0.2694
class 4: 0.2540 0.7460
class 5: 0.0671 0.9329
class 6: 1.0000 0.0000
class 7: 0.8061 0.1939
$savings_coop
Pr(1) Pr(2)
class 1: 0.0638 0.9362
class 2: 0.0500 0.9500
class 3: 0.0696 0.9304
class 4: 0.0031 0.9969
class 5: 0.0000 1.0000
class 6: 0.0000 1.0000
class 7: 0.3616 0.6384
$savings_informal
Pr(1) Pr(2)
class 1: 0.1418 0.8582
class 2: 0.7236 0.2764
class 3: 0.4342 0.5658
class 4: 0.4299 0.5701
class 5: 0.0307 0.9693
class 6: 0.7691 0.2309
class 7: 0.6740 0.3260
$savings_microf
Pr(1) Pr(2)
class 1: 0.0000 1.0000
class 2: 0.0092 0.9908
class 3: 0.9999 0.0001
class 4: 0.0000 1.0000
class 5: 0.0028 0.9972
class 6: 0.0376 0.9624
class 7: 0.0000 1.0000
$insurance
Pr(1) Pr(2)
class 1: 0.1458 0.8542
class 2: 0.0000 1.0000
class 3: 0.1241 0.8759
class 4: 0.0093 0.9907
class 5: 0.0000 1.0000
class 6: 0.0000 1.0000
class 7: 0.1613 0.8387
Estimated class population shares
0.0774 0.262 0.0141 0.3784 0.1946 0.012 0.0615
Predicted class memberships (by modal posterior prob.)
0.0239 0.1845 0.0153 0.3897 0.354 0.0068 0.0258
=========================================================
Fit for 7 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 62
residual degrees of freedom: 193
maximum log-likelihood: -10432.27
AIC(7): 20988.54
BIC(7): 21387
G^2(7): 97.81382 (Likelihood ratio/deviance statistic)
X^2(7): 170.8666 (Chi-square goodness of fit)
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND
# Identify the best number of classes
best_BIC_nclass <- which.min(BIC_values)
best_AIC_nclass <- which.min(AIC_values)
# Print summary of model selection
cat("The best model based on BIC is with nclass =", best_BIC_nclass,
"with BIC =", BIC_values[best_BIC_nclass], "\n")The best model based on BIC is with nclass = 3 with BIC = 21165.61
cat("The best model based on AIC is with nclass =", best_AIC_nclass,
"with AIC =", AIC_values[best_AIC_nclass], "\n")The best model based on AIC is with nclass = 6 with AIC = 20978.16
As shown the best model based on BIC and AIC is 3 and 6 respectively. However, for the purposes of parsimony, I will go with the BIC. This is because unlike the AIC, the BIC imposes a stronger penalty for model complexity and it favors a more parsimonious solution. With this in mind, I can proceed to fit the optimal model to identify the latent clusters.
Fitting the Optimal Model
Fiting the LCA model with 3 classes. I don’t like the aesthetics of the default plot. So I set that to FALSE. Feel free to set that to true
model <- poLCA(f, data, nclass = 3, graphs = FALSE)Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0039 0.9961
class 2: 0.0129 0.9871
class 3: 0.0657 0.9343
$credit
Pr(1) Pr(2)
class 1: 0.5914 0.4086
class 2: 0.0000 1.0000
class 3: 0.3266 0.6734
$savings_ass
Pr(1) Pr(2)
class 1: 0.060 0.940
class 2: 0.001 0.999
class 3: 0.054 0.946
$savings_bank
Pr(1) Pr(2)
class 1: 0.1768 0.8232
class 2: 0.2256 0.7744
class 3: 0.8656 0.1344
$savings_coop
Pr(1) Pr(2)
class 1: 0.0655 0.9345
class 2: 0.0000 1.0000
class 3: 0.1801 0.8199
$savings_informal
Pr(1) Pr(2)
class 1: 0.7347 0.2653
class 2: 0.2792 0.7208
class 3: 0.4927 0.5073
$savings_microf
Pr(1) Pr(2)
class 1: 0.0175 0.9825
class 2: 0.0018 0.9982
class 3: 0.0835 0.9165
$insurance
Pr(1) Pr(2)
class 1: 0.0004 0.9996
class 2: 0.0040 0.9960
class 3: 0.1676 0.8324
Estimated class population shares
0.2552 0.6017 0.1431
Predicted class memberships (by modal posterior prob.)
0.1721 0.7415 0.0865
=========================================================
Fit for 3 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 26
residual degrees of freedom: 229
maximum log-likelihood: -10473.25
AIC(3): 20998.51
BIC(3): 21165.61
G^2(3): 179.7855 (Likelihood ratio/deviance statistic)
X^2(3): 287.4135 (Chi-square goodness of fit)
The estimated class population shares is 26%, 60% and 14% . However, this class size does not tell us anything about what their behaviors are.
class_probs <- model$P # This gives the probability of membership in each class
# Create a data frame for plotting
class_df <- data.frame(
Class = 1:length(class_probs),
Probability = class_probs
)
class_df Class Probability
1 1 0.2552393
2 2 0.6016953
3 3 0.1430655
Plotting Class Membership Probabilities
To visualize the class sizes, we use the data frame class_df to plot it.
ggplot(class_df, aes(x = factor(Class), y = Probability)) +
geom_bar(stat = "identity", fill = "skyblue") +
theme_minimal() +
labs(x = "Latent Class", y = "Membership Probability",
title = "Class Membership Probabilities")Predicting Class Membership
We predict the class membership for each household.
predicted_class <- model$predclass
data$class_membership <- predicted_class
head(data)# A tibble: 6 × 17
hhid wave age hhsize remittance credit savings_ass savings_bank
<dbl> <dbl> <dbl> <dbl> <dbl+lbl> <dbl+lbl> <dbl+lbl> <dbl+lbl>
1 10001 1 50 7 2 [No] 2 [No] 2 [No] 1 [Yes]
2 10001 2 53 8 2 [No] 2 [No] 2 [No] 1 [Yes]
3 10001 3 52 6 2 [No] 2 [No] 2 [No] 1 [Yes]
4 10001 4 55 6 2 [No] 2 [No] 2 [No] 1 [Yes]
5 10002 1 44 7 2 [No] 2 [No] 2 [No] 1 [Yes]
6 10002 2 46 8 2 [No] 2 [No] 2 [No] 1 [Yes]
# ℹ 9 more variables: savings_coop <dbl+lbl>, savings_informal <dbl+lbl>,
# savings_microf <dbl+lbl>, insurance <dbl+lbl>, zone <dbl+lbl>,
# gender <dbl+lbl>, education <dbl+lbl>, location <dbl+lbl>,
# class_membership <int>
To determine which latent class represents low, moderate, or high financial inclusion (FI), you need to examine the conditional response probabilities (i.e., the probabilities of answering “Yes” to each financial access or use variable within each class. To do that, we need to extract the conditional probabilities of belonging to each class.
Extracting Conditional Probabilities
We extract the conditional probabilities for all variables.
model <- poLCA(f, data, nclass = 3, graphs = FALSE)Conditional item response (column) probabilities,
by outcome variable, for each class (row)
$remittance
Pr(1) Pr(2)
class 1: 0.0039 0.9961
class 2: 0.0657 0.9343
class 3: 0.0129 0.9871
$credit
Pr(1) Pr(2)
class 1: 0.5914 0.4086
class 2: 0.3266 0.6734
class 3: 0.0000 1.0000
$savings_ass
Pr(1) Pr(2)
class 1: 0.060 0.940
class 2: 0.054 0.946
class 3: 0.001 0.999
$savings_bank
Pr(1) Pr(2)
class 1: 0.1768 0.8232
class 2: 0.8656 0.1344
class 3: 0.2256 0.7744
$savings_coop
Pr(1) Pr(2)
class 1: 0.0655 0.9345
class 2: 0.1801 0.8199
class 3: 0.0000 1.0000
$savings_informal
Pr(1) Pr(2)
class 1: 0.7347 0.2653
class 2: 0.4927 0.5073
class 3: 0.2792 0.7208
$savings_microf
Pr(1) Pr(2)
class 1: 0.0175 0.9825
class 2: 0.0835 0.9165
class 3: 0.0018 0.9982
$insurance
Pr(1) Pr(2)
class 1: 0.0004 0.9996
class 2: 0.1676 0.8324
class 3: 0.0040 0.9960
Estimated class population shares
0.2552 0.1431 0.6017
Predicted class memberships (by modal posterior prob.)
0.1721 0.0865 0.7415
=========================================================
Fit for 3 latent classes:
=========================================================
number of observations: 4568
number of estimated parameters: 26
residual degrees of freedom: 229
maximum log-likelihood: -10473.25
AIC(3): 20998.51
BIC(3): 21165.61
G^2(3): 179.7855 (Likelihood ratio/deviance statistic)
X^2(3): 287.4104 (Chi-square goodness of fit)
ALERT: iterations finished, MAXIMUM LIKELIHOOD NOT FOUND
cond_probs <- model$probsCreating a Combined Data Frame for Plotting
Having extracted the conditional probabilities, we can now create a data frame and this will allow us to visualize the the probabilities of answering “Yes” to each financial access or use question in the survey within each class.
melted_data <- list()
for (var_name in names(cond_probs)) {
var_probs <- as.data.frame(cond_probs[[var_name]])
var_probs$Class <- factor(1:nrow(var_probs))
var_probs_melt <- melt(var_probs, id.vars = "Class")
var_probs_melt$Variable <- var_name
melted_data[[var_name]] <- var_probs_melt
}
plot_data <- do.call(rbind, melted_data)
plot_data$variable <- factor(plot_data$variable, levels = c("Pr(1)", "Pr(2)"), labels = c("No", "Yes"))
plot_data$Variable <- factor(plot_data$Variable, levels = c("credit", "insurance", "remittance", "savings_ass",
"savings_bank", "savings_coop", "savings_informal",
"savings_microf"),
labels = c("Credits", "Insurance", "Remittance", "Savings association",
"Savings bank", "Savings cooperation", "Informal savings",
"Microfinance savings"))Plotting Conditional Probabilities by Latent Class and Variable
ggplot(plot_data, aes(x = Class, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ Variable, scales = "free_y") +
theme_minimal() +
labs(x = "Latent Class", y = "Conditional Probability",
title = "Conditional Probabilities by Latent Class and Variable") +
scale_fill_manual(values = c("No" = "white", "Yes" = "orange")) +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 0, hjust = 1),
legend.position = "none")Discussion
The analysis identified three distinct financial behavior profiles among Nigerian households based on their access to and use of eight financial services.
Class 3 represents households with high financial inclusion, characterized by consistently high probabilities of access and usage across all financial services.
Class 2 reflects households with limited financial engagement, marked by low participation in a broad range of financial activities. This includes minimal use of microfinance savings, limited access to remittances and insurance, low ownership of bank accounts, and reduced involvement in cooperative and association-based savings mechanisms.
Class 1 indicates moderate financial engagement, with relatively high usage of remittances, insurance, cooperative savings, and microfinance services. However, this group shows lower participation in credit access, informal savings, and savings through associations.
These findings suggest meaningful behavioral and access-related differences across the population.
Conclusion and Future Use Cases
This analysis demonstrates how Latent Class Analysis (LCA) can effectively uncover hidden patterns in household financial behaviors, enabling more tailored financial inclusion strategies.
In business contexts, such segmentation can inform targeted financial product design, credit risk scoring, or customer outreach strategies. In the development sector, LCA can support program targeting for underserved groups. In healthcare, LCA can uncover behavioral subgroups for adherence, symptom management, or response to health financing programs. Finally, LCA offers a powerful tool for analyzing patient-reported outcomes, enabling more personalized interventions and resource allocation.
Future work may link class membership to household welfare indicators or simulate how shifts in service access could transition households across inclusion tiers.