The aim of the InBillo project was to build an automated scoring technology that assesses the economic situation and cooperation risk of market participants, using general information available online and basic firm characteristics. The system was tested on tens of thousands of scoring attempts and marked around 10% of entities as untrusted.
This report uses a reference sample of that dataset and applies association rule mining with Apriori alogithm to find interpretable patterns related to customer trust. The sample can be found at the Poland’s Data Portal: https://dane.gov.pl/pl/dataset/3572,inbillo/resource/54309/table. The main goal of this project is to determine which combinations of non-score business attributes (e.g. age, size, legal form, VAT status, debtor indicator, subsidies, online footprint) are strongly associated with low customer trust.
The sample dataset contains 1000 rows with firm attributes and scoring outputs such as:
Score components:
refer_scorerefer_score (main risk score),
refer_scorecustomer_trust (customer trust score),
refer_scoredevelopment_advance (technology advancement score),
refer_scoreorganization_maturity (organizational maturity score),
refer_scorepayment_morality (payment morality score).
Business attributes:
activity_firmy (firm active status),
company_dataemployment (employment category),
company_datalegal_datalegal_form (legal form),
company_dataestablishment_date (establishment date),
dluznicy_mr (debt amount),
dotacje_sudopsuma_swiadczen (subsidies),
whiteliststatusVat (presence on the White list of VAT),
wwwsocialmedia_list (social media presence list),
wwwtechnologies_list (used technology list).
The Apriori algorithm works on sets of discrete items (transactions), however the dataset includes numeric scores and mixed formats, so each row has to be converted into a “basket” of categorical items.
The standard rule interest measures were used:
Support: frequency of itemset/rule in all transactions - starting threshold - 0.05,
Confidence: conditional probability of RHS given LHS - starting threshold - 0.65,
Lift: confidence normalized by frequency of RHS (Lift > 1 means positive association) - starting threshold - 1.2,
These parameters can later be tuned to control the number and strength of rules.
DATA_PATH <- "inBillo.csv"
df <- read_csv2(DATA_PATH, show_col_types = FALSE)
summary(df)
## _id date refer_scorerefer_score
## Length:1000 Length:1000 Min. : 378
## Class :character Class :character 1st Qu.:35298
## Mode :character Mode :character Median :38960
## Mean :43743
## 3rd Qu.:43225
## Max. :99984
##
## refer_scorecustomer_trust refer_scoredevelopment_advance
## Min. : 72 Min. : 100
## 1st Qu.: 72 1st Qu.:76495
## Median : 72 Median :77504
## Mean :26396 Mean :65904
## 3rd Qu.:57361 3rd Qu.:79931
## Max. :99999 Max. :99999
##
## refer_scoreorganization_maturity refer_scorepayment_morality
## Min. : 0 Length:1000
## 1st Qu.: 100 Class :character
## Median : 100 Mode :character
## Mean : 2868
## 3rd Qu.: 100
## Max. :99999
##
## wwwdomain activity_firmy company_dataemployment
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## company_datalegal_datalegal_form company_dataestablishment_date
## Length:1000 Length:1000
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
## dluznicy_mr dotacje_sudopsuma_swiadczen whiteliststatusVat
## Min. : 1312 Length:1000 Length:1000
## 1st Qu.: 314098 Class :character Class :character
## Median : 1182152 Mode :character Mode :character
## Mean : 8547942
## 3rd Qu.: 5340970
## Max. :469632239
## NA's :526
## wwwsocialmedia_list wwwtechnologies_list
## Length:1000 Length:1000
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
The raw sample dataset contains heterogeneous data formats: dates and numeric values stored as strings, numeric scores and list fields representing online presence. To enable association rule mining dedicated parsers were implemented to standardize dates, numeric amounts and list fields into consistent formats. Continuous score variables were categorized using quantile-based binning to reduce noise and transform numerical indicators into categorical tokens suitable for transaction-based analysis.
The categories were established as follows:
score variables have been divided into three bins: Low/Mid/High,
firm age has been divided into bins: 0-3y, 3-10y, 10-20y, 20y+,
debtor and subsidy have been categorized as Yes/No/Unknown.
List-like features have also been extracted from the columns:
social media platforms like SM=Facebook, etc.,
web technologies like TECH=Google Analytics, etc..
# Parsers
strip_ordinal <- function(s) stringr::str_replace_all(s, "(\\d+)(st|nd|rd|th)", "\\1")
parse_dt <- function(x){
if (is.na(x) || stringr::str_trim(x) == "") return(as.POSIXct(NA))
s <- strip_ordinal(as.character(x))
s <- sub("\\.[0-9]+$", "", s) # remove .000
dt <- strptime(s, format = "%B %e %Y, %H:%M:%S", tz = "UTC")
if (!is.na(dt[1])) return(as.POSIXct(dt))
dt <- strptime(s, format = "%B %d %Y, %H:%M:%S", tz = "UTC")
if (!is.na(dt[1])) return(as.POSIXct(dt))
dt <- strptime(s, format = "%B %e %Y", tz = "UTC")
if (!is.na(dt[1])) return(as.POSIXct(dt))
dt <- strptime(s, format = "%B %d %Y", tz = "UTC")
if (!is.na(dt[1])) return(as.POSIXct(dt))
as.POSIXct(NA)
}
parse_amount <- function(x){
if (is.na(x) || str_trim(x) == "") return(NA_real_)
suppressWarnings(as.numeric(str_replace_all(as.character(x), ",", "")))
}
parse_list <- function(x){
if (is.na(x)) return(character(0))
s <- str_trim(as.character(x))
if (s == "" || s == "[]" || tolower(s) %in% c("nan","none")) return(character(0))
s <- str_remove_all(s, "\\[|\\]|\"|\'")
parts <- str_trim(unlist(str_split(s, ",")))
unique(parts[parts != ""])
}
# Quantile binning
bin_quantile_3 <- function(v){
v <- suppressWarnings(as.numeric(v))
out <- rep(NA_character_, length(v))
ok <- !is.na(v)
if (!any(ok)) return(out)
qs <- quantile(v[ok], probs = c(1/3, 2/3), na.rm = TRUE, names = FALSE)
if (length(unique(qs)) < 2) {
r <- rank(v[ok], ties.method = "average") / sum(ok)
out[ok] <- as.character(cut(
r,
breaks = c(0, 1/3, 2/3, 1),
labels = SCORE_LABELS,
include.lowest = TRUE
))
return(out)
}
out[ok] <- as.character(cut(
v[ok],
breaks = c(-Inf, qs[1], qs[2], Inf),
labels = SCORE_LABELS,
include.lowest = TRUE
))
out
}
# Tokenizer
mk <- function(prefix, value){
if (length(value) == 0) return(NA_character_)
if (length(value) > 1) value <- value[[1]]
value <- as.character(value)
if (is.na(value) || str_trim(value) == "" || tolower(value) %in% c("nan","none")) {
return(NA_character_)
}
paste0(prefix, "=", value)
}
# Lists
mk_list_items <- function(prefix, x){
vals <- parse_list(x)
if (length(vals) == 0) return(character(0))
paste0(prefix, "=", vals)
}
After parsing and categorizing the selected features each observation was transformed into a transaction consisting of categorical items representing firm characteristics. Each transaction aggregates activity status, firm size (number of employees), legal form, VAT status, debtor and subsidy indicators, age category, score bins and social media and technology presence. The resulting dataset contains 1000 transactions and 125 distinct items.
# Building transactions
tx_list <- lapply(seq_len(nrow(df)), function(i){
row <- as.list(df[i, ])
items <- c(
mk("Activity", row$activity_firmy),
mk("Size", row$company_dataemployment),
mk("LegalForm", row$company_datalegal_datalegal_form),
mk("VAT", row$whiteliststatusVat),
mk("Debtor", row$Debtor),
mk("Subsidy", row$Subsidy),
mk("Age", row$Age),
# bins as categorical
unlist(lapply(score_cols, function(sc) mk(sc, row[[paste0(sc, "_bin")]]))),
mk_list_items("SM", row$wwwsocialmedia_list),
mk_list_items("TECH", row$wwwtechnologies_list)
)
items <- items[!is.na(items)]
unique(items)
})
trans <- as(tx_list, "transactions")
summary(trans)
## transactions as itemMatrix in sparse format with
## 1000 rows (elements/itemsets/transactions) and
## 125 columns (items) and a density of 0.12132
##
## most frequent items:
## Activity=Aktywny refer_scoreorganization_maturity=Mid
## 1000 953
## Subsidy=Yes VAT=Czynny
## 915 854
## refer_scorecustomer_trust=Low (Other)
## 610 10833
##
## element (itemset/transaction) length distribution:
## sizes
## 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 27
## 3 18 173 164 137 113 94 81 70 59 46 14 16 6 2 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 13.00 15.00 15.16 17.00 27.00
##
## includes extended item information - examples:
## labels
## 1 Activity=Aktywny
## 2 Age=0-3y
## 3 Age=10-20y
After inspection it is clear that every company in the dataset is
still active and operational, because Activity=Aktywny item
appears in every transaction. Keeping this attribute could yield trivial
rules.
Using a minimum support threshold of 5% and limiting the maximum itemset length to four, the Apriori algorithm identified 8147 frequent itemsets. The most frequent patterns are dominated by medium organizational maturity, subsidy reception, VAT activity, and low customer trust indicators, reflecting common structural and financial characteristics across the analyzed firms.
# Remove constant items (mainly Activity=Aktywny)
trans <- trans[, itemFrequency(trans) < 0.99]
# Identifying frequent itemsets
freq_sets <- apriori(trans, parameter = list(supp = MIN_SUP, target = "frequent itemsets", maxlen = MAXLEN))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## NA 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 4 frequent itemsets TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 50
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[124 item(s), 1000 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## done [0.01s].
## sorting transactions ... done [0.00s].
## writing ... [8147 set(s)] done [0.00s].
## creating S4 object ... done [0.00s].
cat("Frequent itemsets:", length(freq_sets), "\n")
## Frequent itemsets: 8147
inspect(head(sort(freq_sets, by = "support", decreasing = TRUE), 15))
## items support count
## [1] {refer_scoreorganization_maturity=Mid} 0.953 953
## [2] {Subsidy=Yes} 0.915 915
## [3] {refer_scoreorganization_maturity=Mid,
## Subsidy=Yes} 0.885 885
## [4] {VAT=Czynny} 0.854 854
## [5] {refer_scoreorganization_maturity=Mid,
## VAT=Czynny} 0.818 818
## [6] {Subsidy=Yes,
## VAT=Czynny} 0.810 810
## [7] {refer_scoreorganization_maturity=Mid,
## Subsidy=Yes,
## VAT=Czynny} 0.784 784
## [8] {refer_scorecustomer_trust=Low} 0.610 610
## [9] {refer_scorecustomer_trust=Low,
## refer_scoreorganization_maturity=Mid} 0.573 573
## [10] {refer_scorecustomer_trust=Low,
## Subsidy=Yes} 0.553 553
## [11] {refer_scorecustomer_trust=Low,
## refer_scoreorganization_maturity=Mid,
## Subsidy=Yes} 0.532 532
## [12] {Debtor=Unknown} 0.526 526
## [13] {TECH=Form} 0.517 517
## [14] {refer_scorecustomer_trust=Low,
## VAT=Czynny} 0.514 514
## [15] {Debtor=Unknown,
## refer_scoreorganization_maturity=Mid} 0.501 501
In the next step, association rules were generated using the Apriori algorithm with minimum support of 5%, minimum confidence of 65% and minimum lift of 1.2, while again restricting the maximum rule length to four items. This procedure resulted in 5873 rules, which were further filtered by removing redundant rules in order to retain only the most informative and non-overlapping patterns.
# Rule mining - Apriori
rules <- apriori(trans, parameter = list(supp = MIN_SUP, conf = MIN_CONF, maxlen = MAXLEN, target = "rules"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.65 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 4 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 50
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[124 item(s), 1000 transaction(s)] done [0.00s].
## sorting and recoding items ... [40 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4
## done [0.01s].
## writing ... [12647 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
rules <- subset(rules, lift >= MIN_LIFT)
rules <- sort(rules, by = "lift", decreasing = TRUE)
cat("Rules:", length(rules), "\n")
## Rules: 5873
inspect(head(rules, 20))
## lhs rhs support confidence coverage lift count
## [1] {Debtor=Unknown,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.133 1.0000000 0.133 5.405405 133
## [2] {Age=20y+,
## Debtor=Unknown,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.078 1.0000000 0.078 5.405405 78
## [3] {Debtor=Unknown,
## refer_scorecustomer_trust=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.118 1.0000000 0.118 5.405405 118
## [4] {LegalForm=SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ,
## refer_scoredevelopment_advance=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.050 1.0000000 0.050 5.405405 50
## [5] {Debtor=Unknown,
## refer_scoredevelopment_advance=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.086 1.0000000 0.086 5.405405 86
## [6] {Debtor=Unknown,
## LegalForm=SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.073 1.0000000 0.073 5.405405 73
## [7] {Age=10-20y,
## Debtor=Unknown,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.052 1.0000000 0.052 5.405405 52
## [8] {Debtor=Unknown,
## refer_scorerefer_score=High,
## TECH=Google Analytics} => {refer_scorepayment_morality=High} 0.067 1.0000000 0.067 5.405405 67
## [9] {Debtor=Unknown,
## refer_scorerefer_score=High,
## TECH=Google Tag Manager} => {refer_scorepayment_morality=High} 0.076 1.0000000 0.076 5.405405 76
## [10] {Debtor=Unknown,
## refer_scorerefer_score=High,
## SM=Facebook} => {refer_scorepayment_morality=High} 0.082 1.0000000 0.082 5.405405 82
## [11] {Debtor=Unknown,
## refer_scorerefer_score=High,
## TECH=Form} => {refer_scorepayment_morality=High} 0.073 1.0000000 0.073 5.405405 73
## [12] {Debtor=Unknown,
## refer_scorerefer_score=High,
## VAT=Czynny} => {refer_scorepayment_morality=High} 0.126 1.0000000 0.126 5.405405 126
## [13] {Debtor=Unknown,
## refer_scorerefer_score=High,
## Subsidy=Yes} => {refer_scorepayment_morality=High} 0.129 1.0000000 0.129 5.405405 129
## [14] {Debtor=Unknown,
## refer_scoreorganization_maturity=Mid,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.132 1.0000000 0.132 5.405405 132
## [15] {Age=20y+,
## Debtor=Unknown,
## refer_scorecustomer_trust=High} => {refer_scorepayment_morality=High} 0.083 0.9880952 0.084 5.341055 83
## [16] {LegalForm=SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ,
## refer_scorecustomer_trust=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.065 0.9701493 0.067 5.244050 65
## [17] {refer_scorecustomer_trust=High,
## refer_scorerefer_score=High,
## TECH=Google Analytics} => {refer_scorepayment_morality=High} 0.060 0.9677419 0.062 5.231037 60
## [18] {refer_scoredevelopment_advance=High,
## refer_scoreorganization_maturity=Mid,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.090 0.9677419 0.093 5.231037 90
## [19] {Age=20y+,
## refer_scoredevelopment_advance=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.059 0.9672131 0.061 5.228179 59
## [20] {refer_scorecustomer_trust=High,
## refer_scoredevelopment_advance=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.080 0.9638554 0.083 5.210029 80
This step resulted in dropping 2908 rules, proving high overlap between patterns within the dataset. The strongest rules predominantly link combinations of firm characteristics and high overall scores with high payment morality, indicating consistent relationships between organizational stability and financial reliability.
# Removing the most redundant rules
rules_nr <- rules[!is.redundant(rules)]
cat("Non-redundant rules:", length(rules_nr), "\n")
## Non-redundant rules: 2965
inspect(head(rules_nr, 20))
## lhs rhs support confidence coverage lift count
## [1] {Debtor=Unknown,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.133 1.0000000 0.133 5.405405 133
## [2] {LegalForm=SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ,
## refer_scoredevelopment_advance=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.050 1.0000000 0.050 5.405405 50
## [3] {Age=20y+,
## Debtor=Unknown,
## refer_scorecustomer_trust=High} => {refer_scorepayment_morality=High} 0.083 0.9880952 0.084 5.341055 83
## [4] {LegalForm=SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ,
## refer_scorecustomer_trust=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.065 0.9701493 0.067 5.244050 65
## [5] {refer_scorecustomer_trust=High,
## refer_scorerefer_score=High,
## TECH=Google Analytics} => {refer_scorepayment_morality=High} 0.060 0.9677419 0.062 5.231037 60
## [6] {refer_scoredevelopment_advance=High,
## refer_scoreorganization_maturity=Mid,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.090 0.9677419 0.093 5.231037 90
## [7] {Age=20y+,
## refer_scoredevelopment_advance=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.059 0.9672131 0.061 5.228179 59
## [8] {refer_scorecustomer_trust=High,
## refer_scoredevelopment_advance=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.080 0.9638554 0.083 5.210029 80
## [9] {Age=20y+,
## refer_scorecustomer_trust=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.073 0.9605263 0.076 5.192034 73
## [10] {refer_scorecustomer_trust=High,
## refer_scorerefer_score=High,
## SM=Facebook} => {refer_scorepayment_morality=High} 0.072 0.9600000 0.075 5.189189 72
## [11] {refer_scoredevelopment_advance=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.090 0.9574468 0.094 5.175388 90
## [12] {refer_scorecustomer_trust=High,
## refer_scorerefer_score=High,
## VAT=Czynny} => {refer_scorepayment_morality=High} 0.116 0.9508197 0.122 5.139566 116
## [13] {refer_scorerefer_score=High,
## SM=Facebook,
## TECH=Google Tag Manager} => {refer_scorepayment_morality=High} 0.057 0.9500000 0.060 5.135135 57
## [14] {refer_scorecustomer_trust=High,
## refer_scorerefer_score=High,
## TECH=Google Tag Manager} => {refer_scorepayment_morality=High} 0.068 0.9444444 0.072 5.105105 68
## [15] {Age=20y+,
## refer_scorerefer_score=High,
## TECH=Google Tag Manager} => {refer_scorepayment_morality=High} 0.050 0.9433962 0.053 5.099439 50
## [16] {refer_scorecustomer_trust=High,
## refer_scorerefer_score=High,
## Subsidy=Yes} => {refer_scorepayment_morality=High} 0.119 0.9370079 0.127 5.064907 119
## [17] {refer_scorecustomer_trust=High,
## refer_scorerefer_score=High} => {refer_scorepayment_morality=High} 0.123 0.9318182 0.132 5.036855 123
## [18] {refer_scoreorganization_maturity=Mid,
## refer_scorerefer_score=High,
## TECH=Google Tag Manager} => {refer_scorepayment_morality=High} 0.076 0.9268293 0.082 5.009888 76
## [19] {refer_scorerefer_score=High,
## SM=Facebook,
## TECH=Google Analytics} => {refer_scorepayment_morality=High} 0.051 0.9107143 0.056 4.922780 51
## [20] {refer_scoreorganization_maturity=Mid,
## refer_scorerefer_score=High,
## TECH=Google Analytics} => {refer_scorepayment_morality=High} 0.068 0.9066667 0.075 4.900901 68
However, this process still produced a large number of rules - 2965 non-redundant rules - many of which involved relationships between different scoring components. To align the analysis with the research objective and to improve interpretability, the rule collection was narrowed down using domain-driven constraints.
Firstly, the analysis was restricted to rules predicting low customer
trust (refer_scorecustomer_trust = Low) and only rules with
a single-item right-hand side were kept. This step reduced the rule set
to 170 rules, ensuring that each rule directly addresses the outcome of
interest.
# Keeping only rules with single-item RHS, that indicate low customer trust
rules_nr_1rhs <- rules_nr[size(rhs(rules_nr)) == 1]
rules_trust_low <- subset(rules_nr_1rhs, rhs %pin% "refer_scorecustomer_trust=Low")
rules_trust_low <- sort(rules_trust_low, by = "lift", decreasing = TRUE)
cat("\nRules with RHS = refer_scorecustomer_trust=Low :", length(rules_trust_low), "\n")
##
## Rules with RHS = refer_scorecustomer_trust=Low : 170
inspect(head(rules_trust_low, 10))
## lhs rhs support confidence coverage lift count
## [1] {refer_scoredevelopment_advance=Mid,
## refer_scorepayment_morality=Mid,
## refer_scorerefer_score=Mid} => {refer_scorecustomer_trust=Low} 0.087 0.9886364 0.088 1.620715 87
## [2] {Debtor=Unknown,
## refer_scorepayment_morality=Mid,
## refer_scorerefer_score=Mid} => {refer_scorecustomer_trust=Low} 0.156 0.9873418 0.158 1.618593 156
## [3] {Age=20y+,
## Debtor=Unknown,
## refer_scorepayment_morality=Mid} => {refer_scorecustomer_trust=Low} 0.059 0.9833333 0.060 1.612022 59
## [4] {Debtor=Unknown,
## refer_scorerefer_score=Mid,
## SM=Facebook} => {refer_scorecustomer_trust=Low} 0.107 0.9816514 0.109 1.609265 107
## [5] {Debtor=Unknown,
## refer_scoredevelopment_advance=Mid,
## refer_scorerefer_score=Mid} => {refer_scorecustomer_trust=Low} 0.091 0.9784946 0.093 1.604090 91
## [6] {Debtor=Unknown,
## refer_scorerefer_score=Mid,
## Size=1-10} => {refer_scorecustomer_trust=Low} 0.128 0.9696970 0.132 1.589667 128
## [7] {Debtor=Unknown,
## refer_scorerefer_score=Mid,
## TECH=Form} => {refer_scorecustomer_trust=Low} 0.121 0.9680000 0.125 1.586885 121
## [8] {refer_scoredevelopment_advance=Low,
## refer_scorepayment_morality=Low,
## refer_scorerefer_score=High} => {refer_scorecustomer_trust=Low} 0.150 0.9677419 0.155 1.586462 150
## [9] {Debtor=Unknown,
## refer_scorerefer_score=Mid,
## TECH=Google Tag Manager} => {refer_scorecustomer_trust=Low} 0.110 0.9649123 0.114 1.581823 110
## [10] {LegalForm=OSOBA FIZYCZNA PROWADZĄCA DZIAŁALNOŚĆ GOSPODARCZĄ,
## refer_scorepayment_morality=Mid,
## refer_scorerefer_score=Mid} => {refer_scorecustomer_trust=Low} 0.137 0.9647887 0.142 1.581621 137
Secondly, all rules containing score-based variables on the left-hand side were excluded to avoid meaningless “score results in score” relationships. This resulted in a concise set of 29 rules that link non-score business attributes, such as firm age, size, legal form, debtor status, VAT activity and online footprint to low customer trust.
# Exclude score-items from LHS to avoid "score --> score"
is_score_item <- function(x) grepl("^refer_score", x)
lhs_list <- LIST(lhs(rules_trust_low), decode = TRUE)
lhs_has_score <- vapply(lhs_list, function(items) any(is_score_item(items)), logical(1))
rules_trust_low_noscorelhs <- rules_trust_low[!lhs_has_score]
rules_trust_low_noscorelhs <- sort(rules_trust_low_noscorelhs, by = "lift", decreasing = TRUE)
cat("\nRules (trust=Low) with no score-items on LHS:", length(rules_trust_low_noscorelhs), "\n")
##
## Rules (trust=Low) with no score-items on LHS: 29
inspect(head(rules_trust_low_noscorelhs, 20))
## lhs rhs support confidence coverage lift count
## [1] {Age=3-10y,
## Debtor=Unknown,
## Size=1-10} => {refer_scorecustomer_trust=Low} 0.069 0.9452055 0.073 1.549517 69
## [2] {Age=3-10y,
## Debtor=Unknown,
## TECH=Google Analytics} => {refer_scorecustomer_trust=Low} 0.055 0.9016393 0.061 1.478097 55
## [3] {Age=3-10y,
## Debtor=Unknown,
## SM=Facebook} => {refer_scorecustomer_trust=Low} 0.066 0.8800000 0.075 1.442623 66
## [4] {Age=3-10y,
## Size=1-10,
## SM=Facebook} => {refer_scorecustomer_trust=Low} 0.060 0.8695652 0.069 1.425517 60
## [5] {Age=3-10y,
## Debtor=Unknown,
## TECH=Google Tag Manager} => {refer_scorecustomer_trust=Low} 0.066 0.8684211 0.076 1.423641 66
## [6] {Age=3-10y,
## Size=1-10,
## VAT=Czynny} => {refer_scorecustomer_trust=Low} 0.093 0.8611111 0.108 1.411658 93
## [7] {Debtor=Unknown,
## Size=1-10,
## TECH=Form} => {refer_scorecustomer_trust=Low} 0.128 0.8533333 0.150 1.398907 128
## [8] {Debtor=Unknown,
## LegalForm=OSOBA FIZYCZNA PROWADZĄCA DZIAŁALNOŚĆ GOSPODARCZĄ,
## Size=1-10} => {refer_scorecustomer_trust=Low} 0.147 0.8497110 0.173 1.392969 147
## [9] {Age=3-10y,
## Size=1-10,
## TECH=Form} => {refer_scorecustomer_trust=Low} 0.067 0.8481013 0.079 1.390330 67
## [10] {Age=3-10y,
## Debtor=Unknown,
## TECH=Form} => {refer_scorecustomer_trust=Low} 0.070 0.8433735 0.083 1.382579 70
## [11] {Age=3-10y,
## Debtor=Unknown} => {refer_scorecustomer_trust=Low} 0.113 0.8432836 0.134 1.382432 113
## [12] {Debtor=Unknown,
## Size=1-10,
## VAT=Czynny} => {refer_scorecustomer_trust=Low} 0.186 0.8340807 0.223 1.367345 186
## [13] {Age=3-10y,
## Size=1-10} => {refer_scorecustomer_trust=Low} 0.110 0.8333333 0.132 1.366120 110
## [14] {Debtor=Unknown,
## Size=1-10,
## TECH=Google Tag Manager} => {refer_scorecustomer_trust=Low} 0.109 0.8257576 0.132 1.353701 109
## [15] {Age=3-10y,
## LegalForm=OSOBA FIZYCZNA PROWADZĄCA DZIAŁALNOŚĆ GOSPODARCZĄ,
## VAT=Czynny} => {refer_scorecustomer_trust=Low} 0.052 0.8253968 0.063 1.353110 52
## [16] {Age=3-10y,
## LegalForm=OSOBA FIZYCZNA PROWADZĄCA DZIAŁALNOŚĆ GOSPODARCZĄ} => {refer_scorecustomer_trust=Low} 0.056 0.8235294 0.068 1.350048 56
## [17] {Debtor=Unknown,
## Size=1-10,
## SM=Facebook} => {refer_scorecustomer_trust=Low} 0.107 0.8230769 0.130 1.349306 107
## [18] {Debtor=Unknown,
## Size=1-10} => {refer_scorecustomer_trust=Low} 0.196 0.8200837 0.239 1.344399 196
## [19] {Debtor=Unknown,
## LegalForm=OSOBA FIZYCZNA PROWADZĄCA DZIAŁALNOŚĆ GOSPODARCZĄ,
## VAT=Czynny} => {refer_scorecustomer_trust=Low} 0.174 0.8093023 0.215 1.326725 174
## [20] {Debtor=Unknown,
## LegalForm=OSOBA FIZYCZNA PROWADZĄCA DZIAŁALNOŚĆ GOSPODARCZĄ,
## TECH=Google Tag Manager} => {refer_scorecustomer_trust=Low} 0.104 0.8062016 0.129 1.321642 104
The strongest rules indicate that small and relatively young firms (3–10 years of operation) with unknown debtor status are particularly associated with low customer trust, especially when combined with a little number of employees (1-10). The frequent occurrence of the Unknown value in debtor status in the final rules, could naturally be interpreted as lack of information but it could also very likely be an indicator of limited transparency regarding the debt burden. These patterns suggest that insufficient honesty in terms of financial obligations in new, small firms often negatively influences trustworthiness.
Additionally, several rules highlight the role of online presence, where firms that rely primarily on basic digital tools, such as Facebook or simple web Forms show an increased likelihood of low customer trust. This may reflect limited reputation or lower levels of digital maturity, which affects how potential customers perceive reliability.
Overall, the results demonstrate that low customer trust is not driven by a single factor but rather by specific combinations of structural and behavioral attributes, supporting the idea that trust-related risk emerges from the interaction of firm age, number of employees, financial transparency and digital footprint rather than from isolated characteristics.