1 Introduction

This study is based on original data collected personally last year as part of a survey for NLB Banka. The primary objective is to analyse and better understand the preferences of young individuals (specifically those aged 18 to 27) regarding the ongoing transition from cash to digital payment methods. In an increasingly dynamic financial ecosystem, understanding how “digital natives” perceive the security, speed, and convenience of emerging payment instruments is critically important for banking institutions seeking to support and guide this transformation effectively.

The resulting dataset consists of 440 observations and a comprehensive set of variables covering spending behaviour and risk perception, providing a robust foundation for multivariate statistical analysis. To address the complexity of the data structure, the quantitative techniques discussed during the course will be applied within an exploratory analytical framework. Specifically, the empirical analysis will be organised into two main stages:

Principal Component Analysis (PCA): used to reduce the dimensionality of the dataset and extract latent constructs such as perceived security (PCASafety), ease of use (PCAEase), and spending control (PCAControl), while also mitigating potential multicollinearity issues;
Cluster Analysis: used to segment the sample into homogeneous groups using Ward’s hierarchical method and the K-means algorithm. This approach enables the identification and mapping of heterogeneous behavioural profiles among young consumers, rather than treating them as a homogeneous population.

2 Literature review

Recent academic literature confirms that the transition towards a “cashless society” is a global phenomenon, although it is characterised by dynamics specific to younger populations.

According to Demir et al. (2024), in their study on the intention to use integrated payment systems, the most influential factor for young people is not simply perceived ease of use, but rather lifestyle compatibility. This suggests that young consumers are more likely to adopt digital payment instruments that integrate seamlessly into their daily routines, such as mobile applications and “one-click” payment solutions. Usman et al. (2025) also emphasise the crucial role of financial literacy and perceived behavioural control. Young people who feel more competent in managing their financial resources are significantly more likely to develop a clear behavioural intention towards fintech adoption.

Despite strong momentum towards digitalisation, cash continues to retain both psychological and practical relevance. Puusniekka (2020) introduces the concept of the “pain of paying,” arguing that the tangible nature of cash enhances mental accounting and expenditure control. In contrast, debit cards and other electronic instruments tend to increase willingness to spend, as the separation from money becomes less salient and less psychologically “visible.” This research highlights that many young people still perceive cash as a “safe” or even “sacred” method for preventing overspending and over-indebtedness – concerns that also emerge clearly from the data collected for NLB (Puusniekka, 2020).

The adoption of financial technologies is strongly shaped by social environments. Both Demir et al. (2024) and Usman et al. (2025) identify social influence – stemming from friends, family, and peer groups – as a significant predictor of behavioural intention, particularly for peer-to-peer (P2P) payment applications, where network effects are central. Nevertheless, substantial barriers persist, especially those related to privacy and data security. Concerns about the protection of personal information often hinder a full transition to exclusively smartphone-based systems or neobanking solutions.

3 Dataset overview

library(readr)
library(dplyr)

NLB_data <- read_csv2(
  "~/Desktop/NLB/nlb_data.csv",
  locale = locale(encoding = "UTF-8")
)

# Remove columns 2 to 8 and 101 to 116
NLB_data <- NLB_data %>% 
  select(-c(2:8, 101:116))

#Remove first row
NLB_data <- NLB_data[-1, ]

head(NLB_data)

## # A tibble: 6 × 93
##   status Q1    Q3a   Q3b   Q3c   Q3d   Q4a   Q4b   Q4c   Q4d   Q5    Q5_4_text
##   <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>    
## 1 6      7     3     2     5     2     4     3     2     1     2     -2       
## 2 6      9     2     5     1     1     2     1     1     1     1     -2       
## 3 6      9     2     4     4     1     2     2     1     1     2     -2       
## 4 6      7     3     5     5     1     2     2     1     1     1     -2       
## 5 6      7     2     5     5     1     2     1     1     1     1     -2       
## 6 6      9     3     3     1     2     4     3     2     1     2     -2       
## # ℹ 81 more variables: Q6a <chr>, Q6b <chr>, Q6c <chr>, Q6d <chr>, Q6e <chr>,
## #   Q6f <chr>, Q6g <chr>, Q7 <chr>, Q8a <dbl>, Q9a <chr>, Q9b <chr>, Q9c <chr>,
## #   Q9d <chr>, Q9e <chr>, Q9f <chr>, Q9g <chr>, Q9h <chr>, Q31_2a <chr>,
## #   Q31_2b <chr>, Q31_2c <chr>, Q31_2d <chr>, Q31_2e <chr>, Q31_2f <chr>,
## #   Q10a <chr>, Q10b <chr>, Q10c <chr>, Q10d <chr>, Q10e <chr>, Q11a <chr>,
## #   Q11b <chr>, Q11c <chr>, Q11d <chr>, Q11e <chr>, Q12a <chr>, Q12b <chr>,
## #   Q12c <chr>, Q12d <chr>, Q12e <chr>, Q13a <chr>, Q13b <chr>, Q13c <chr>, …

Unit of observation: Each row represents a single respondent from the questionnaire, which was completed by young people aged 18 to 27.
Sample size: The dataset includes responses from a total of 440 young individuals.

3.1 Variables description

Status: status of the questionnaire (6 = valid, 5 = invalid)

Q1a: Age

Q3: How many times a month do you use the following payment methods on average? (1 = never, 2 = 1-3x month, 3 = 1x week, 4 = many times x week, 5 = everyday)

Q3a: Cash
Q3b: Debit or Credit (physical) cards
Q3c: Phone
Q3d: Other (PayPal, Stripe…)

Q4: How often do you use cash payment for the following purchases? (1 = never, 2 = 1-3x month, 3 = 1x week, 4 = many times x week, 5 = everyday)

Q4a: Purchases up to €10
Q4b: Purchases between €11 and €99
Q4c: Purchases between €100 and €1000
Q4d: Purchases over €1000

Q5: How do you usually respond if a merchant does not accept digital payments?

1: I prefer to go elsewhere and pay with a digital payment method.
2: I pay in the form available, even if it means withdrawing cash from an ATM.
3: I have not encountered such a situation yet.
4: Other (please specify).

Q6: Which of the following income sources do you have, and how do you receive them? (1 = fully cash, 2 = half cash/half digital, 3 = fully digital, 4 = don’t get income from this source, 5 = don’t want to answer)

Q6a: Salary or earnings from student work
Q6b: Pocket Money (from family members)
Q6c: Gifts (for birthdays, holidays, etc.)
Q6d: Unreported or occasional work (childcare, tutoring, etc.)
Q6e: Government and/or social benefits
Q6f: Government and other forms of scholarships (e.g., Zois, corporate)
Q6g: Returns from investments (stocks, bonds, cryptocurrencies, etc.)

Q7: Do you save money? (This question does NOT include savings from parents or family members.)

1: Yes
2: No

Q8: In what form do you save money? (Digital vs. cash savings) (1 = fully cash…7 = fully digital)

Q9: Various attitudes towards digital and cash payments. (1 = strongly disagree…7 = strongly agree)

Q9a: I usually spend money in the form in which I received it.
Q9b: I feel concerned about the security of my personal information when using digital payment methods.
Q9c: I find digital payments less secure than cash payments.
Q9d: I have more confidence in digital payment methods if they offer features like two-factor authentication.
Q9e: I feel safe when I carry cash with me.
Q9f: I prefer to use digital payments because they are more convenient and save time.
Q9g: I prefer to use cash to avoid overspending.
Q9h: I use cash only when digital payments are not possible.

Q31_2: Please rate the importance of each of the following factors that influence your choice of payment method. (1 = not important at all…7 = extremely important)

Q31_2a: Ease of use
Q31_2b: Speed of transaction
Q31_2c: Ability to use in stores
Q31_2d: Security of the payment method
Q31_2e: Features for tracking and budgeting

Q10: How safe do you think the following payment methods are? (1 = strongly not safe…7 = strongly safe)

Q10a: Cash
Q10b: (Physical) Debit Card
Q10c: (Physical) Credit Card
Q10d: Paying with your phone (Flik, Apple Pay…)
Q10e: Neobanks (Revolut, N26…)

Q11: How easy do you find the following payment methods to use? (1 = strongly not easy…7 = strongly easy)

Q11a: Cash
Q11b: (Physical) Debit Card
Q11c: (Physical) Credit Card
Q11d: Paying with your phone (Flik, Apple Pay…)
Q11e: Neobanks (Revolut, N26…)

Q12: How accepted do you think the following payment methods are in stores in your environment? (1 = strongly not accepted…7 = strongly accepted)

Q12a: Cash
Q12b: (Physical) Debit Card
Q12c: (Physical) Credit Card
Q12d: Paying with your phone (Flik, Apple Pay…)
Q12e: Neobanks (Revolut, N26…)

Q13: How fast do you consider the following payment methods? (1 = strongly not fast…7 = strongly fast)

Q13a: Cash
Q13b: (Physical) Debit Card
Q13c: (Physical) Credit Card
Q13d: Paying with your phone (Flik, Apple Pay…)
Q13e: Neobanks (Revolut, N26…)

Q14: How do you consider the following payment methods from a privacy perspective? (1 = strongly not private…7 = strongly private)

Q14a: Cash
Q14b: (Physical) Debit Card
Q14c: (Physical) Credit Card
Q14d: Paying with your phone (Flik, Apple Pay…)
Q14e: Neobanks (Revolut, N26…)

Q15: How do you consider the following payment methods from a saving check perspective? (1 = I totally don’t have control…7 = I have total control)

Q15a: Cash
Q15b: (Physical) Debit Card
Q15c: (Physical) Credit Card
Q15d: Paying with your phone (Flik, Apple Pay…)
Q15e: Neobanks (Revolut, N26…)

Q16: Social influence on payment method choice. (1 = strongly disagree…7 = strongly agree)

Q16a: I choose the payment methods my friends choose.
Q16b: I choose the payment methods my family members choose.

Q17: How do you most often share expenses among friends?

1: With cash
2: Through mobile applications (Flik, Revolut, PayPal…)
3: By bank transfer
4: I don’t share expenses among friends
5: Other (please specify)

Q18: Reasons for preferring digital payments. (0 = Yes, 1 = No)

Q18a: I can quote the exact sum.
Q18b: To avoid paying with cash.
Q18c: Because the process is quick and convenient.
Q18d: Because I have my transactions recorded and it is easier to manage finances.
Q18e: Other (please specify).

Q19: Concerns about digital payment security. (1 = strongly not concerns, 7 = strongly concerns)

Q19a: Fraud (e.g., stealing money)
Q19b: Disclosure of Personal Information
Q19c: Identity theft
Q19d: Loss of access due to a hacker attack

Q20: Experience with online fraud.

1: Yes, it happened to me.
2: Yes, I know people who have had this happen.
3: Yes, it has happened to me and others I know.
4: I’ve never encountered such a situation.

Q21: How did this affect your behavior in further payment habits? (0 = Yes, 1 = No)

Q21a: I use cash more often in unfamiliar or suspicious situations (e.g., when traveling).
Q21b: I use digital payment methods (e.g., virtual or disposable cards facilitated by neobanks) more often in unfamiliar or suspicious situations.
Q21c: I am more cautious with online payments.
Q21d: I switched to more secure payment options (e.g., mobile wallets with authentication).
Q21e: My behavior hasn’t changed.

Q22: If we had the opportunity, would you switch to digital payments entirely?

1: Yes, immediately.
2: I would consider it, but I wouldn’t want to give up cash completely.
3: No, I prefer to use cash.
4: I don’t know.

Q23: What is your gender?

1: Man.
2: Woman.
3: Another.
4: I don’t want to answer.

Q24: What is your highest level of educational attainment?

1: Unfinished primary school.
2: Completed primary school.
3: Completed lower or secondary vocational education.
4: Completed secondary professional or general education.
5: Completed tertiary professional or tertiary professional education (including 1st Bologna level).
6: Completed higher university education (including 2nd Bologna level).
7: Completed specialization, scientific master’s degree, PhD.

Q25: What is your current status?

1: Student.
2: Employed.
3: Self-employed.
4: Unemployed.
5: Other (please specify).

Q26: What is your current net monthly income?

1: 0-200 EUR.
2: 201-500 EUR.
3: 501-800 EUR.
4: 801-1300 EUR.
5: More than 1300 EUR.

Q27: Which bank do you currently use as your primary bank?

1: NLB.
2: OTP.
3: Intesa Sanpaolo.
4: Sparkasse.
5: Addiko Bank.
6: Workers’ Savings Bank.
7: Other (please specify).

4 Data manipulation

#Remove all questionnaires with non valid status
library(dplyr)

NLB_data <- NLB_data %>%
  filter(!status == 5)

#Remove status column
NLB_data <- NLB_data[ , -1]

#Remove all under 18/over 27
library(dplyr)

NLB_data <- NLB_data %>%
 filter(!Q1 %in% c(1, 12))

# Rename columns
colnames(NLB_data) <- c("Age", "Cash_Use", "Card_Use", "Phone_Use", "OtherPay_Use",
                        "Cash_Up10", "Cash_11_99", "Cash_100_1000", "Cash_Over1000",
                        "NoDigital_Response", "NoDigital_Response_Text", "Income_StudJobSalary", "Income_PocketMoney", 
                        "Income_Gifts", "Income_Occasional", "Income_Subsidy", 
                        "Income_Scholarship", "Income_Investments", "Save_Money", 
                        "Save_Form", "Spend_SameForm", "Concern_Security", "LessSecure_Digital", 
                        "Trust_2FA", "Safe_CashCarry", "Prefer_Digital_Convenience", 
                        "Prefer_Cash_Control", "Use_Cash_IfNoDigital", "Importance_Ease", "Importance_Speed",  
                        "Importance_Availability", "Importance_Security", "Importance_TrackingBudgeting", 
                        "Importance_Privacy", "Safe_Cash", "Safe_DebitCard", "Safe_CreditCard", "Safe_PhonePay", 
                        "Safe_Neobank", "Easy_Cash", "Easy_DebitCard", "Easy_CreditCard", "Easy_PhonePay", 
                        "Easy_Neobank", "Accept_Cash", "Accept_DebitCard", "Accept_CreditCard", 
                        "Accept_PhonePay", "Accept_Neobank", "Fast_Cash", "Fast_DebitCard", 
                        "Fast_CreditCard", "Fast_PhonePay", "Fast_Neobank", "Private_Cash", 
                        "Private_DebitCard", "Private_CreditCard", "Private_PhonePay", 
                        "Private_Neobank", "Control_Cash", "Control_DebitCard", "Control_CreditCard", 
                        "Control_PhonePay", "Control_Neobank", "Social_Friends", "Social_Family", 
                        "Expense_Sharing", "Expense_Sharing_Text", "Reason_ExactSum", "Reason_NoCash", 
                        "Reason_Convenient", "Reason_TrackFinances", "Reason_Other", "Reason_Other_Text",  
                        "Concern_Fraud", "Concern_PersonalInfo", "Concern_IDTheft", 
                        "Concern_Hacker", "OnlineFraud_Exp", "Behavior_MoreCash", 
                        "Behavior_SecureDigital", "Behavior_Cautious", "Behavior_SecureOption", 
                        "Behavior_NoChange", "Switch_Digital", "Gender", "Education", 
                        "Status_Employment", "Status_Employment_Text", "Income_Level", "Primary_Bank", "Primary_Bank_Text")

4.1 Factoring

# Q1
NLB_data$AgeF <- factor(NLB_data$Age,
                       levels = c(2:11),
                       labels = c(18:27))

#Q3a

NLB_data$Cash_UseF <- factor(NLB_data$Cash_Use,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Never", "1-3 monthly", "1 per week", "Several times a week", "Daily"))

#Q3b
NLB_data$Card_UseF <- factor(NLB_data$Card_Use,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Never", "1-3 monthly", "1 per week", "Several times a week", "Daily"))

#Q3c

NLB_data$Phone_UseF <- factor(NLB_data$Phone_Use,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Never", "1-3 monthly", "1 per week", "Several times a week", "Daily"))

#Q3d

NLB_data$OtherPay_UseF <- factor(NLB_data$OtherPay_Use,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Never", "1-3 monthly", "1 per week", "Several times a week", "Daily"))

#Q4a

NLB_data$Cash_Up10F <- factor(NLB_data$Cash_Up10,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Never", "Less than half", "Half", "More than half", "Always"))

#Q4b

NLB_data$Cash_11_99F <- factor(NLB_data$Cash_11_99,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Never", "Less than half", "Half", "More than half", "Always"))

#Q4c

NLB_data$Cash_100_1000F <- factor(NLB_data$Cash_100_1000,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Never", "Less than half", "Half", "More than half", "Always"))

#Q4d

NLB_data$Cash_Over1000F <- factor(NLB_data$Cash_Over1000,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Never", "Less than half", "Half", "More than half", "Always"))

# Q5
NLB_data$NoDigital_ResponseF <- factor(NLB_data$NoDigital_Response,
                       levels = c(1, 2, 3, 4),
                       labels = c("Pay digital elsewhere", "Pay as available", "Never occurred", "Other"))

#Q6a 

NLB_data$Income_StudJobSalaryF <- factor(NLB_data$Income_StudJobSalary,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Cash", "Cash&Digitally", "Digitally", "Not using", "Don't want to answer"))

#Q6b

NLB_data$Income_PocketMoneyF <- factor(NLB_data$Income_PocketMoney,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Cash", "Cash&Digitally", "Digitally", "Not using", "Don't want to answer"))

#Q6c

NLB_data$Income_GiftsF <- factor(NLB_data$Income_Gifts,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Cash", "Cash&Digitally", "Digitally", "Not using", "Don't want to answer"))

#Q6d

NLB_data$Income_OccasionalF <- factor(NLB_data$Income_Occasional,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Cash", "Cash&Digitally", "Digitally", "Not using", "Don't want to answer"))

#Q6e

NLB_data$Income_SubsidyF <- factor(NLB_data$Income_Subsidy,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Cash", "Cash&Digitally", "Digitally", "Not using", "Don't want to answer"))

#Q6f

NLB_data$Income_ScholarshipF <- factor(NLB_data$Income_Scholarship,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Cash", "Cash&Digitally", "Digitally", "Not using", "Don't want to answer"))
# Q7

NLB_data$Income_InvestmentsF <- factor(NLB_data$Income_Investments,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Cash", "Cash&Digitally", "Digitally", "Not using", "Don't want to answer"))

# Q17

NLB_data$Expense_SharingF <- factor(NLB_data$Expense_Sharing,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Cash", "Mobile apps", "Bank transfer", "Don't share", "Other"))

# Q20

NLB_data$OnlineFraud_ExpF <- factor(NLB_data$OnlineFraud_Exp,
                       levels = c(1, 2, 3, 4),
                       labels = c("Yes - me", "Yes - others", "Yes - both", "No"))

# Q22

NLB_data$Switch_DigitalF <- factor(NLB_data$Switch_Digital,
                       levels = c(1, 2, 3, 4),
                       labels = c("Fully digital", "Balance digital-cash", "Cash", "Don't know"))

# Q23

NLB_data$GenderF <- factor(NLB_data$Gender,
                       levels = c(1, 2, 3, 4),
                       labels = c("Man", "Woman", "Other", "Don't want to answer"))

# Q24

NLB_data$EducationF <- factor(NLB_data$Education,
                       levels = c(1, 2, 3, 4, 5, 6, 7),
                       labels = c("Unfinished primary", "Primary school", "Vocational education", "High School", "Bachelor Degree", "Master Degree", "PhD" ))

# Q25

NLB_data$Status_EmploymentF <- factor(NLB_data$Status_Employment,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Student", "Employed", "Self-employed", "Unemployed", "Other"))

# Q26

NLB_data$Income_LevelF <- factor(NLB_data$Income_Level,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("0-200 EUR", "201-500 EUR", "501-800 EUR", "801-1300 EUR", "Above 1300 EUR"))

# Q27

NLB_data$Primary_BankF <- factor(NLB_data$Primary_Bank,
                       levels = c(1, 2, 3, 4, 5, 6, 7),
                       labels = c("NLB", "OTP", "Intesa Sanpaolo", "Sparkasse", "Addiko Bank", "Delavska Hranilnica", "Other"))

head(NLB_data %>% select(ends_with("F")))

## # A tibble: 6 × 25
##   AgeF  Cash_UseF   Card_UseF    Phone_UseF OtherPay_UseF Cash_Up10F Cash_11_99F
##   <fct> <fct>       <fct>        <fct>      <fct>         <fct>      <fct>      
## 1 23    1 per week  1-3 monthly  Daily      1-3 monthly   More than… Half       
## 2 25    1-3 monthly Daily        Never      Never         Less than… Never      
## 3 25    1-3 monthly Several tim… Several t… Never         Less than… Less than …
## 4 23    1 per week  Daily        Daily      Never         Less than… Less than …
## 5 23    1-3 monthly Daily        Daily      Never         Less than… Never      
## 6 25    1 per week  1 per week   Never      1-3 monthly   More than… Half       
## # ℹ 18 more variables: Cash_100_1000F <fct>, Cash_Over1000F <fct>,
## #   NoDigital_ResponseF <fct>, Income_StudJobSalaryF <fct>,
## #   Income_PocketMoneyF <fct>, Income_GiftsF <fct>, Income_OccasionalF <fct>,
## #   Income_SubsidyF <fct>, Income_ScholarshipF <fct>,
## #   Income_InvestmentsF <fct>, Expense_SharingF <fct>, OnlineFraud_ExpF <fct>,
## #   Switch_DigitalF <fct>, GenderF <fct>, EducationF <fct>,
## #   Status_EmploymentF <fct>, Income_LevelF <fct>, Primary_BankF <fct>

5 Descriptive statistics

5.1 Numerical data

library(dplyr)

# Convert character columns in specified ranges to numeric
NLB_data <- NLB_data %>%
  mutate(across(c(20:66, 75:78), 
                ~ ifelse(!is.na(.), as.numeric(.), .)))

# Select specific numerical data for summary
NLBdata_Likert <- NLB_data[, c("Save_Form", "Spend_SameForm", "Concern_Security", "LessSecure_Digital", 
                                "Trust_2FA", "Safe_CashCarry", "Prefer_Digital_Convenience", 
                                "Prefer_Cash_Control", "Use_Cash_IfNoDigital", "Importance_Ease", 
                                "Importance_Speed", "Importance_Availability", "Importance_Security", 
                                "Importance_TrackingBudgeting", "Importance_Privacy", "Safe_Cash", 
                                "Safe_DebitCard", "Safe_CreditCard", "Safe_PhonePay", "Safe_Neobank", 
                                "Easy_Cash", "Easy_DebitCard", "Easy_CreditCard", "Easy_PhonePay", 
                                "Easy_Neobank", "Accept_Cash", "Accept_DebitCard", "Accept_CreditCard", 
                                "Accept_PhonePay", "Accept_Neobank", "Fast_Cash", "Fast_DebitCard", 
                                "Fast_CreditCard", "Fast_PhonePay", "Fast_Neobank", "Private_Cash", 
                                "Private_DebitCard", "Private_CreditCard", "Private_PhonePay", 
                                "Private_Neobank", "Control_Cash", "Control_DebitCard", "Control_CreditCard", 
                                "Control_PhonePay", "Control_Neobank", "Social_Friends", "Social_Family", 
                                "Concern_Fraud", "Concern_PersonalInfo", "Concern_IDTheft", "Concern_Hacker")]

summary(NLBdata_Likert)

##    Save_Form      Spend_SameForm  Concern_Security LessSecure_Digital
##  Min.   :-2.000   Min.   :1.000   Min.   :1.000    Min.   :1.000     
##  1st Qu.: 1.000   1st Qu.:4.000   1st Qu.:1.000    1st Qu.:2.000     
##  Median : 4.000   Median :6.000   Median :3.000    Median :4.000     
##  Mean   : 3.395   Mean   :5.316   Mean   :3.115    Mean   :3.655     
##  3rd Qu.: 7.000   3rd Qu.:7.000   3rd Qu.:4.000    3rd Qu.:5.000     
##  Max.   : 7.000   Max.   :7.000   Max.   :7.000    Max.   :7.000     
##    Trust_2FA     Safe_CashCarry  Prefer_Digital_Convenience Prefer_Cash_Control
##  Min.   :1.000   Min.   :1.000   Min.   :1.000              Min.   :1.000      
##  1st Qu.:4.000   1st Qu.:3.000   1st Qu.:6.000              1st Qu.:1.000      
##  Median :6.000   Median :4.000   Median :7.000              Median :4.000      
##  Mean   :5.359   Mean   :4.135   Mean   :6.046              Mean   :3.441      
##  3rd Qu.:7.000   3rd Qu.:5.000   3rd Qu.:7.000              3rd Qu.:5.000      
##  Max.   :7.000   Max.   :7.000   Max.   :7.000              Max.   :7.000      
##  Use_Cash_IfNoDigital Importance_Ease Importance_Speed Importance_Availability
##  Min.   :1.000        Min.   :1.000   Min.   :1        Min.   :1.000          
##  1st Qu.:5.000        1st Qu.:6.000   1st Qu.:6        1st Qu.:6.000          
##  Median :6.000        Median :7.000   Median :7        Median :7.000          
##  Mean   :5.704        Mean   :6.105   Mean   :6        Mean   :6.168          
##  3rd Qu.:7.000        3rd Qu.:7.000   3rd Qu.:7        3rd Qu.:7.000          
##  Max.   :7.000        Max.   :7.000   Max.   :7        Max.   :7.000          
##  Importance_Security Importance_TrackingBudgeting Importance_Privacy
##  Min.   :1.000       Min.   :1.000                Min.   :1.000     
##  1st Qu.:5.000       1st Qu.:4.000                1st Qu.:4.000     
##  Median :7.000       Median :5.000                Median :6.000     
##  Mean   :6.076       Mean   :5.201                Mean   :5.658     
##  3rd Qu.:7.000       3rd Qu.:7.000                3rd Qu.:7.000     
##  Max.   :7.000       Max.   :7.000                Max.   :7.000     
##    Safe_Cash     Safe_DebitCard  Safe_CreditCard Safe_PhonePay  
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:4.000   1st Qu.:4.000   1st Qu.:4.000   1st Qu.:4.000  
##  Median :6.000   Median :5.000   Median :5.000   Median :5.000  
##  Mean   :5.503   Mean   :5.138   Mean   :4.947   Mean   :5.174  
##  3rd Qu.:7.000   3rd Qu.:6.000   3rd Qu.:6.000   3rd Qu.:7.000  
##  Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   :7.000  
##   Safe_Neobank     Easy_Cash     Easy_DebitCard  Easy_CreditCard Easy_PhonePay 
##  Min.   :1.000   Min.   :1.000   Min.   :3.000   Min.   :3.00    Min.   :1.00  
##  1st Qu.:4.000   1st Qu.:4.000   1st Qu.:5.000   1st Qu.:5.00    1st Qu.:7.00  
##  Median :5.000   Median :5.000   Median :7.000   Median :7.00    Median :7.00  
##  Mean   :4.826   Mean   :5.102   Mean   :6.148   Mean   :6.02    Mean   :6.48  
##  3rd Qu.:6.000   3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:7.00    3rd Qu.:7.00  
##  Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   :7.00    Max.   :7.00  
##   Easy_Neobank    Accept_Cash    Accept_DebitCard Accept_CreditCard
##  Min.   :1.000   Min.   :1.000   Min.   :3.000    Min.   :2.000    
##  1st Qu.:4.000   1st Qu.:6.000   1st Qu.:6.000    1st Qu.:6.000    
##  Median :6.000   Median :7.000   Median :7.000    Median :7.000    
##  Mean   :5.592   Mean   :6.227   Mean   :6.484    Mean   :6.336    
##  3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:7.000    3rd Qu.:7.000    
##  Max.   :7.000   Max.   :7.000   Max.   :7.000    Max.   :7.000    
##  Accept_PhonePay Accept_Neobank    Fast_Cash     Fast_DebitCard 
##  Min.   :1.00    Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:5.00    1st Qu.:4.000   1st Qu.:2.750   1st Qu.:5.000  
##  Median :7.00    Median :5.000   Median :4.000   Median :7.000  
##  Mean   :6.03    Mean   :5.076   Mean   :4.076   Mean   :6.174  
##  3rd Qu.:7.00    3rd Qu.:7.000   3rd Qu.:6.000   3rd Qu.:7.000  
##  Max.   :7.00    Max.   :7.000   Max.   :7.000   Max.   :7.000  
##  Fast_CreditCard Fast_PhonePay    Fast_Neobank    Private_Cash  
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:5.000   1st Qu.:7.000   1st Qu.:4.000   1st Qu.:4.000  
##  Median :7.000   Median :7.000   Median :6.000   Median :7.000  
##  Mean   :6.122   Mean   :6.526   Mean   :5.684   Mean   :5.681  
##  3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:7.000   3rd Qu.:7.000  
##  Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   :7.000  
##  Private_DebitCard Private_CreditCard Private_PhonePay Private_Neobank
##  Min.   :1.000     Min.   :1.000      Min.   :1.000    Min.   :1.000  
##  1st Qu.:4.000     1st Qu.:4.000      1st Qu.:3.000    1st Qu.:4.000  
##  Median :4.000     Median :4.000      Median :4.000    Median :4.000  
##  Mean   :4.457     Mean   :4.457      Mean   :4.395    Mean   :4.484  
##  3rd Qu.:6.000     3rd Qu.:6.000      3rd Qu.:6.000    3rd Qu.:6.000  
##  Max.   :7.000     Max.   :7.000      Max.   :7.000    Max.   :7.000  
##   Control_Cash   Control_DebitCard Control_CreditCard Control_PhonePay
##  Min.   :1.000   Min.   :1.000     Min.   :1.000      Min.   :1.000   
##  1st Qu.:3.000   1st Qu.:4.000     1st Qu.:4.000      1st Qu.:4.000   
##  Median :5.000   Median :5.000     Median :5.000      Median :5.000   
##  Mean   :4.852   Mean   :5.155     Mean   :5.007      Mean   :5.161   
##  3rd Qu.:7.000   3rd Qu.:7.000     3rd Qu.:7.000      3rd Qu.:7.000   
##  Max.   :7.000   Max.   :7.000     Max.   :7.000      Max.   :7.000   
##  Control_Neobank Social_Friends  Social_Family   Concern_Fraud   
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :-1.000  
##  1st Qu.:4.000   1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 4.000  
##  Median :5.000   Median :3.000   Median :4.000   Median : 5.000  
##  Mean   :5.115   Mean   :3.125   Mean   :3.339   Mean   : 4.507  
##  3rd Qu.:7.000   3rd Qu.:4.000   3rd Qu.:5.000   3rd Qu.: 6.000  
##  Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   : 7.000  
##  Concern_PersonalInfo Concern_IDTheft  Concern_Hacker  
##  Min.   :-1.000       Min.   :-1.000   Min.   :-1.000  
##  1st Qu.: 4.000       1st Qu.: 3.000   1st Qu.: 4.000  
##  Median : 4.000       Median : 4.000   Median : 5.000  
##  Mean   : 4.457       Mean   : 4.273   Mean   : 4.849  
##  3rd Qu.: 6.000       3rd Qu.: 6.000   3rd Qu.: 6.000  
##  Max.   : 7.000       Max.   : 7.000   Max.   : 7.000

5.1.1 Summary of key insights - Numerical data

5.1.1.1 Trust and Security Preferences

Trust in Security Measures: The data reveals significant variation in users’ trust in different security measures, particularly 2FA (Two-Factor Authentication). The median value for Trust_2FA is 6, indicating that most users place high trust in two-factor authentication systems.
Security Concerns: Users show notable concern regarding personal information security, with Concern_PersonalInfo having a mean value of 4.457. This suggests that there is a moderate level of concern regarding the security of personal data. Interestingly, concerns related to fraud (Concern_Fraud) and digital theft (Concern_IDTheft) also follow similar patterns, with medians around 5, highlighting a general apprehension about fraud and security breaches in the digital space.

5.1.1.2 Preferences for Cash and Digital Payments

Preference for Digital Convenience: The Prefer_Digital_Convenience variable shows that most participants favor digital payment solutions for their ease of use, with a high median score of 7. This aligns with the growing trend toward digitalization of financial transactions.
Preference for Cash Control: In contrast, a significant portion of users still values cash control, with the Prefer_Cash_Control variable showing a mean of 3.441 and a median of 4, pointing towards a preference for having tangible control over their spending.
Willingness to Use Cash in the Absence of Digital Payments: Interestingly, the Use_Cash_IfNoDigital variable has a mean of 5.704, indicating that while most users prefer digital methods, they are still willing to revert to cash if necessary.

5.1.1.3 Digital Payment Security

Security of Digital Payment Methods: There is a noticeable difference in the perception of security between various digital payment methods. Safe_CashCarry has a median of 4, while methods like Safe_DebitCard and Safe_CreditCard have medians of 5, suggesting that while users feel relatively secure with traditional methods, there is more trust in newer forms of payment like digital wallets.
Cash vs. Digital Payment Security: It’s clear from the data that traditional cash-based payments (e.g., Safe_CashCarry) are generally perceived as safer, but users’ trust in digital solutions like debit and credit cards is growing. However, there’s still a gap when it comes to mobile payment methods, with mobile solutions such as Safe_PhonePay scoring lower in perceived safety compared to physical cards.

5.1.1.4 Payment Speed

Preference for Fast Payments: In terms of speed, digital payments are preferred for their efficiency. Fast_Cash has a mean of 4.076, suggesting a moderate preference for faster cash-based transactions. Conversely, digital payments such as Fast_CreditCard and Fast_PhonePay are rated highly, with means over 6. This reflects the growing demand for quick transactions in today’s fast-paced world.

5.1.1.6 Conclusion

In summary, the findings highlight a clear dichotomy between the traditional and digital worlds in users’ preferences for financial transactions. While there’s a strong trust in traditional security measures and cash-based transactions, there is also a growing acceptance of digital solutions, albeit with a more cautious approach toward security. Users’ social circles influence their preferences and decisions, underscoring the importance of peer-driven advice in financial matters. The research points toward a future where the convergence of digital convenience and robust security measures will drive the evolution of financial decision-making.

5.2 Categorical data

# Select specific categorical data for summary
categorical_columns <- c("AgeF", "Cash_UseF", "Card_UseF", "Phone_UseF", "OtherPay_UseF", 
                         "Cash_Up10F", "Cash_11_99F", "Cash_100_1000F", "Cash_Over1000F", 
                         "NoDigital_ResponseF", "Income_StudJobSalaryF", "Income_PocketMoneyF", 
                         "Income_GiftsF", "Income_OccasionalF", "Income_SubsidyF", "Income_ScholarshipF", 
                         "Income_InvestmentsF", "Expense_SharingF", "OnlineFraud_ExpF", "Switch_DigitalF", 
                         "GenderF", "EducationF", "Status_EmploymentF", "Income_LevelF", "Primary_BankF")

# Use summary to view frequency counts for categorical columns
summary(NLB_data[, categorical_columns])

##       AgeF                   Cash_UseF                  Card_UseF  
##  23     :64   Never               : 23   Never               : 22  
##  22     :44   1-3 monthly         :133   1-3 monthly         : 62  
##  25     :43   1 per week          : 66   1 per week          : 27  
##  20     :40   Several times a week: 65   Several times a week:125  
##  24     :30   Daily               : 17   Daily               : 68  
##  21     :23                                                        
##  (Other):60                                                        
##                 Phone_UseF               OtherPay_UseF          Cash_Up10F 
##  Never               : 81   Never               :180   Never         : 34  
##  1-3 monthly         : 24   1-3 monthly         : 96   Less than half:169  
##  1 per week          : 20   1 per week          : 15   Half          : 55  
##  Several times a week: 71   Several times a week: 12   More than half: 31  
##  Daily               :108   Daily               :  1   Always        : 15  
##                                                                            
##                                                                            
##          Cash_11_99F         Cash_100_1000F        Cash_Over1000F
##  Never         : 88   Never         :197    Never         :252   
##  Less than half:153   Less than half: 68    Less than half: 21   
##  Half          : 37   Half          : 16    Half          :  8   
##  More than half: 18   More than half: 17    More than half:  8   
##  Always        :  8   Always        :  6    Always        : 15   
##                                                                  
##                                                                  
##             NoDigital_ResponseF          Income_StudJobSalaryF
##  Pay digital elsewhere: 52      Cash                : 10      
##  Pay as available     :195      Cash&Digitally      : 43      
##  Never occurred       : 42      Digitally           :217      
##  Other                : 15      Not using           : 34      
##                                 Don't want to answer:  0      
##                                                               
##                                                               
##            Income_PocketMoneyF              Income_GiftsF
##  Cash                : 76      Cash                :251  
##  Cash&Digitally      : 63      Cash&Digitally      : 36  
##  Digitally           : 57      Digitally           :  3  
##  Not using           :107      Not using           : 13  
##  Don't want to answer:  1      Don't want to answer:  1  
##                                                          
##                                                          
##             Income_OccasionalF             Income_SubsidyF
##  Cash                : 88      Cash                :  1   
##  Cash&Digitally      : 17      Cash&Digitally      :  1   
##  Digitally           :  3      Digitally           : 62   
##  Not using           :187      Not using           :235   
##  Don't want to answer:  9      Don't want to answer:  5   
##                                                           
##                                                           
##            Income_ScholarshipF           Income_InvestmentsF
##  Cash                :  0      Cash                :  2     
##  Cash&Digitally      :  2      Cash&Digitally      :  2     
##  Digitally           :137      Digitally           : 93     
##  Not using           :164      Not using           :203     
##  Don't want to answer:  1      Don't want to answer:  4     
##                                                             
##                                                             
##       Expense_SharingF     OnlineFraud_ExpF             Switch_DigitalF
##  Cash         : 28     Yes - me    : 22     Fully digital       : 78   
##  Mobile apps  :250     Yes - others: 96     Balance digital-cash:171   
##  Bank transfer: 14     Yes - both  : 11     Cash                : 44   
##  Don't share  :  8     No          :172     Don't know          : 10   
##  Other        :  4     NA's        :  3     NA's                :  1   
##                                                                        
##                                                                        
##                  GenderF                   EducationF      Status_EmploymentF
##  Man                 :120   Unfinished primary  :  0   Student      :236     
##  Woman               :182   Primary school      :  8   Employed     : 49     
##  Other               :  1   Vocational education:  2   Self-employed:  7     
##  Don't want to answer:  1   High School         :132   Unemployed   :  3     
##                             Bachelor Degree     : 99   Other        :  9     
##                             Master Degree       : 61                         
##                             PhD                 :  2                         
##         Income_LevelF             Primary_BankF
##  0-200 EUR     :48    NLB                :121  
##  201-500 EUR   :91    OTP                :109  
##  501-800 EUR   :58    Intesa Sanpaolo    : 17  
##  801-1300 EUR  :48    Sparkasse          :  7  
##  Above 1300 EUR:57    Addiko Bank        :  8  
##  NA's          : 2    Delavska Hranilnica: 19  
##                       Other              : 23

5.2.1 Summary of key insights - Categorical data

5.2.1.1 Payment Frequency and Method Preferences

Cash Usage:
A large proportion of participants (133 respondents) reported using cash 1-3 times a month, while 66 participants use cash 1 per week, and 17 use it daily. Interestingly, the group that never uses cash for payments is smaller (23 respondents), suggesting that cash remains a popular choice for most users.
Card Usage:
For card payments, the most common frequency is several times a week (125 participants), followed by daily usage (68 participants), highlighting that many users rely heavily on cards for transactions. Only 22 respondents never use cards for payments.
Phone Payments:
Mobile payments, as expected, have a high daily usage rate with 108 respondents using their phones for payments daily. The group that never uses phone payments is notably large (81 respondents), which could reflect concerns around mobile payment security or simply a preference for other methods.

5.2.1.2 Cash Spend and Usage Distribution

Frequency of Cash Usage in Different Ranges:
The frequency of cash spending also exhibits a wide range of behavior:
- Less than 10 EUR: Only 34 respondents report never spending this amount, with a large number using it less than half the time.
- Cash between 11 EUR to 99 EUR: Most respondents report less than half of their cash spending falling in this range, with a few respondents reporting always spending this amount.
- Cash Usage Above 1000 EUR: The highest usage category is never spending above 1000 EUR, with 252 respondents reporting so.
Other Payment Methods:
Digital payment methods like bank transfers and mobile apps have seen a higher level of use, with 250 respondents reporting using mobile apps and 14 respondents using bank transfers.

5.2.1.3 Income Sources and Digital vs. Cash Payments

Income and Payment Preferences:
Participants reported diverse income sources:
- Cash remains the predominant source of income for many respondents, particularly for PocketMoney and Gifts, where 251 and 36 participants respectively rely on cash.
- Digitally received income is also quite common, particularly for Scholarships (137 respondents) and Investments (93 respondents), suggesting that digital income sources are more prevalent in these categories.
Responses to Digital Payments:
Most respondents (195 participants) report paying digitally when available, but a significant portion (52 respondents) specifically never pay digitally elsewhere.

5.2.1.4 Spending Habits and Financial Control

Sharing Expenses:
Mobile apps are commonly used for expense-sharing, with 250 respondents using them for this purpose. A smaller group of 28 respondents still use cash for sharing expenses, while 8 respondents do not share their expenses at all.
Switch to Digital Payments:
There is a notable shift toward digital payments, with 78 participants fully digital and 171 participants balancing digital and cash. However, 44 respondents still primarily use cash for their payments.

5.2.1.5 Demographic Breakdown

Gender Distribution:
The gender distribution in the sample is 120 males and 182 females, with a small portion of participants marking their gender as other or not wanting to answer.
Educational Background:
A significant number of participants have completed high school (132 respondents), with 99 respondents holding a bachelor’s degree, and a smaller portion possessing a master’s degree (61 respondents). The educational distribution indicates a predominantly young, educated sample, which could influence financial decisions and preferences.
Employment Status:
The majority of the participants are students (236 respondents), followed by employed individuals (49 respondents). The data suggests that many participants may be financially dependent, influencing their preferences toward cash, digital payments, and financial control.
Income Levels:
Students and young individuals are more likely to report lower-income brackets (e.g., 0-200 EUR), with a median income level appearing to be low overall, particularly for scholarships and casual income sources. However, a small portion of individuals report incomes above 1300 EUR, particularly for those engaged in digital or investment-based income.
Primary Bank Usage:
NLB and OTP are the two most used banks among respondents, with 121 and 109 participants, respectively, favoring these institutions. Other banks like Intesa Sanpaolo and Sparkasse are used by significantly fewer respondents, indicating a reliance on a few major banks for digital transactions.

5.2.1.6 Conclusion

The categorical analysis highlights the increasing shift towards digital payments, especially with mobile phones and cards. However, cash remains relevant for many respondents, particularly for daily purchases and income sources. Participants’ educational backgrounds and employment statuses suggest that students and younger individuals, particularly those with lower incomes, are more likely to engage with digital payment systems and seek financial control through cash or digital combinations.

Furthermore, the reliance on social platforms and digital methods for expense sharing is evident, with mobile apps dominating as the preferred platform for dividing costs among peers. Financial preferences, particularly between cash and digital methods, are influenced by age, income level, and bank affiliations, pointing to the growing but cautious adoption of digital financial solutions among younger, student populations.

6 PCA Creation

The decision to use Principal Component Analysis (PCA) as the initial step in the analytical sequence addresses several methodological considerations identified during the exploratory assessment of the NLB dataset.

Dimensionality Reduction and Management of Analytical Complexity The original dataset contains a substantial number of metric variables reflecting young individuals’ perceptions of five distinct payment instruments (Cash, Debit Cards, Credit Cards, Smartphone Payments, and Neobanks). Treating each of these 24 variables individually – capturing aspects such as security, ease of use, availability, speed, privacy, and control for each payment method – would have made the analysis both diffuse and technically redundant. PCA was therefore selected to reduce dimensionality, synthesising these measures into six latent principal components (PCASafety, PCAEase, PCAAvailability, PCASpeed, PCAPrivacy, PCAControl) that effectively capture the essence of respondents’ perceptions while preserving the substantive informational content of the dataset.

Mitigation of Multicollinearity

Previous research in the field of digital payments indicates that variables such as “ease of use” and “speed” are often highly correlated, as systems perceived as user-friendly are frequently considered efficient. In our dataset, the biplot interpretation confirms that instruments such as cards and mobile payments cluster according to shared efficiency characteristics. The application of PCA allows correlated variables to be transformed into orthogonal (uncorrelated) components, a critical prerequisite for ensuring the robustness of subsequent multivariate analyses, such as Cluster Analysis, and for preventing the undue influence of redundant variables on analytical outcomes.

Construction of a Perception Map (Visual Interpretation)

In addition to data reduction, PCA was employed for its capacity to generate a perceptual map. The first two dimensions (Dim1 and Dim2) together account for 93.1% of the total variance, providing sufficient fidelity to visually represent the psychological trade-offs faced by young consumers:

Dimension 1 delineates the contrast between traditional (cash) and modern (digital) payment methods.
Dimension 2 captures the trade-off between the convenience of digital solutions and concerns about security and privacy.

This visualisation is consistent with findings in the literature (Usman et al., 2025), which highlight financial literacy and perceived behavioural control as primary drivers of digital adoption, while perceived risk continues to act as a deterrent.

Identification of Latent Constructs Substantiated by Prior Research

The extraction of components related to security and spending control finds empirical support in studies such as Puusniekka (2020), which highlight the enduring perception of cash as a “sacred” instrument for tangible budgetary control. Through PCA, these latent constructs were statistically isolated, confirming that cash remains strongly associated with privacy while exhibiting negative correlations with speed and availability.

NLB_PCASafety <- NLB_data[ , c("Safe_DebitCard", "Safe_CreditCard", "Safe_PhonePay", "Safe_Neobank")]

NLB_PCAEase <- NLB_data[ , c("Easy_DebitCard", "Easy_CreditCard", "Easy_PhonePay", "Easy_Neobank")]

NLB_PCAAvailability <- NLB_data[ , c("Accept_DebitCard", "Accept_CreditCard", "Accept_PhonePay", "Accept_Neobank")]

NLB_PCASpeed <- NLB_data[ , c("Fast_DebitCard", "Fast_CreditCard", "Fast_PhonePay", "Fast_Neobank")]

NLB_PCAPrivacy <- NLB_data[ , c("Private_DebitCard", "Private_CreditCard", "Private_PhonePay", "Private_Neobank")]

NLB_PCAControl <- NLB_data[ , c("Control_DebitCard", "Control_CreditCard", "Control_PhonePay", "Control_Neobank")]

library(pastecs)

round(stat.desc(NLB_PCASafety, basic = FALSE), 2)

##              Safe_DebitCard Safe_CreditCard Safe_PhonePay Safe_Neobank
## median                 5.00            5.00          5.00         5.00
## mean                   5.14            4.95          5.17         4.83
## SE.mean                0.07            0.08          0.09         0.09
## CI.mean.0.95           0.14            0.15          0.17         0.17
## var                    1.58            1.85          2.28         2.30
## std.dev                1.26            1.36          1.51         1.52
## coef.var               0.25            0.28          0.29         0.31

round(stat.desc(NLB_PCAEase, basic = FALSE), 2)

##              Easy_DebitCard Easy_CreditCard Easy_PhonePay Easy_Neobank
## median                 7.00            7.00          7.00         6.00
## mean                   6.15            6.02          6.48         5.59
## SE.mean                0.06            0.07          0.06         0.09
## CI.mean.0.95           0.13            0.14          0.12         0.17
## var                    1.26            1.45          1.22         2.31
## std.dev                1.12            1.20          1.10         1.52
## coef.var               0.18            0.20          0.17         0.27

round(stat.desc(NLB_PCAAvailability, basic = FALSE), 2)

##              Accept_DebitCard Accept_CreditCard Accept_PhonePay Accept_Neobank
## median                   7.00              7.00            7.00           5.00
## mean                     6.48              6.34            6.03           5.08
## SE.mean                  0.05              0.06            0.07           0.10
## CI.mean.0.95             0.11              0.13            0.14           0.19
## var                      0.88              1.24            1.51           2.81
## std.dev                  0.94              1.11            1.23           1.68
## coef.var                 0.14              0.18            0.20           0.33

round(stat.desc(NLB_PCASpeed, basic = FALSE), 2)

##              Fast_DebitCard Fast_CreditCard Fast_PhonePay Fast_Neobank
## median                 7.00            7.00          7.00         6.00
## mean                   6.17            6.12          6.53         5.68
## SE.mean                0.07            0.07          0.06         0.08
## CI.mean.0.95           0.13            0.14          0.12         0.17
## var                    1.34            1.47          1.08         2.18
## std.dev                1.16            1.21          1.04         1.48
## coef.var               0.19            0.20          0.16         0.26

round(stat.desc(NLB_PCAPrivacy, basic = FALSE), 2)

##              Private_DebitCard Private_CreditCard Private_PhonePay
## median                    4.00               4.00             4.00
## mean                      4.46               4.46             4.39
## SE.mean                   0.10               0.10             0.10
## CI.mean.0.95              0.20               0.20             0.20
## var                       3.01               2.99             3.22
## std.dev                   1.73               1.73             1.80
## coef.var                  0.39               0.39             0.41
##              Private_Neobank
## median                  4.00
## mean                    4.48
## SE.mean                 0.09
## CI.mean.0.95            0.19
## var                     2.71
## std.dev                 1.64
## coef.var                0.37

round(stat.desc(NLB_PCAControl, basic = FALSE), 2)

##              Control_DebitCard Control_CreditCard Control_PhonePay
## median                    5.00               5.00             5.00
## mean                      5.15               5.01             5.16
## SE.mean                   0.10               0.10             0.11
## CI.mean.0.95              0.19               0.20             0.21
## var                       2.92               3.03             3.38
## std.dev                   1.71               1.74             1.84
## coef.var                  0.33               0.35             0.36
##              Control_Neobank
## median                  5.00
## mean                    5.12
## SE.mean                 0.10
## CI.mean.0.95            0.19
## var                     2.84
## std.dev                 1.69
## coef.var                0.33

library(FactoMineR)
components10 <- PCA(NLB_PCASafety,
                  scale.unit = TRUE,
                  graph = FALSE,
                  ncp = 1)

library(FactoMineR)
components11 <- PCA(NLB_PCAEase,
                  scale.unit = TRUE,
                  graph = FALSE,
                  ncp = 1)

library(FactoMineR)
components12 <- PCA(NLB_PCAAvailability,
                  scale.unit = TRUE,
                  graph = FALSE,
                  ncp = 1)

library(FactoMineR)
components13 <- PCA(NLB_PCASpeed,
                  scale.unit = TRUE,
                  graph = FALSE,
                  ncp = 1)

library(FactoMineR)
components14 <- PCA(NLB_PCAPrivacy,
                  scale.unit = TRUE,
                  graph = FALSE,
                  ncp = 1)

library(FactoMineR)
components15 <- PCA(NLB_PCAControl,
                  scale.unit = TRUE,
                  graph = FALSE,
                  ncp = 1)

NLB_data$PCASafety <- components10$ind$coord[ , 1]
NLB_data$PCAEase <- components11$ind$coord[ , 1]
NLB_data$PCAAvailability <- components12$ind$coord[ , 1]
NLB_data$PCASpeed <- components13$ind$coord[ , 1]
NLB_data$PCAPrivacy <- components14$ind$coord[ , 1]
NLB_data$PCAControl <- components15$ind$coord[ , 1]

head(NLB_data)

## # A tibble: 6 × 123
##   Age   Cash_Use Card_Use Phone_Use OtherPay_Use Cash_Up10 Cash_11_99
##   <chr> <chr>    <chr>    <chr>     <chr>        <chr>     <chr>     
## 1 7     3        2        5         2            4         3         
## 2 9     2        5        1         1            2         1         
## 3 9     2        4        4         1            2         2         
## 4 7     3        5        5         1            2         2         
## 5 7     2        5        5         1            2         1         
## 6 9     3        3        1         2            4         3         
## # ℹ 116 more variables: Cash_100_1000 <chr>, Cash_Over1000 <chr>,
## #   NoDigital_Response <chr>, NoDigital_Response_Text <chr>,
## #   Income_StudJobSalary <chr>, Income_PocketMoney <chr>, Income_Gifts <chr>,
## #   Income_Occasional <chr>, Income_Subsidy <chr>, Income_Scholarship <chr>,
## #   Income_Investments <chr>, Save_Money <chr>, Save_Form <dbl>,
## #   Spend_SameForm <dbl>, Concern_Security <dbl>, LessSecure_Digital <dbl>,
## #   Trust_2FA <dbl>, Safe_CashCarry <dbl>, Prefer_Digital_Convenience <dbl>, …

7 Clustering

Following the reduction of dataset complexity through PCA, the second phase of the analytical sequence involved applying Cluster Analysis. This technique was chosen to address the limitations of mean-based analyses and to capture the intrinsic heterogeneity of Generation Z regarding digital payment behaviours.

In line with Puusniekka (2020), young individuals do not form a homogeneous group: their path towards financial independence is shaped by personal preferences, family habits, and varying levels of digital competence. The primary aim of the clustering procedure is therefore to identify distinct behavioural profiles, enabling the bank to develop targeted strategies and move from a generic understanding of the population to segmentation grounded in the latent perceptions extracted through PCA.

To ensure the robustness of the results, a two-step procedure was adopted:

Ward’s Method (Hierarchical): Initially used to explore the underlying structure of the data and to visualise relationships via a dendrogram. This method minimises within-cluster variance, assisting in identifying the natural number of segments.
K-means Algorithm (Non-hierarchical): Once the optimal number of clusters was determined through the Elbow Method and Silhouette Analysis – which suggested a four-cluster solution – the K-means algorithm was applied to optimise the classification of all 440 respondents.

NLB_CluStd <- as.data.frame(scale(NLB_data[c("PCASafety", "PCAEase", "PCAAvailability", "PCASpeed", "PCAPrivacy", "PCAControl")]))

NLB_CluStd$Dissimilarity <- sqrt(NLB_CluStd$PCASafety^2 + NLB_CluStd$PCAEase^2 + NLB_CluStd$PCAAvailability^2 + NLB_CluStd$PCASpeed^2 +NLB_CluStd$PCAPrivacy^2 + NLB_CluStd$PCAControl^2)

library(factoextra)

Distances <- get_dist(NLB_CluStd,
                      method = "euclidian")

fviz_dist(Distances, 
          gradient = list(low = "#230078",    # NLB INDIGO BLUE
                          mid = "#A7A8AA",    # NLB LIGHT GRAY
                          high = "white"))

NLB_CluStd <- NLB_CluStd %>% rename(Security = PCASafety, 
                                    `Ease of use` = PCAEase, 
                                    Availability = PCAAvailability, 
                                    Speed = PCASpeed, 
                                    Privacy = PCAPrivacy, 
                                    `Spending Control` = PCAControl)

library(factoextra)
get_clust_tendency(NLB_CluStd,
                   n = nrow(NLB_CluStd) -1,
                   graph = FALSE)

## $hopkins_stat
## [1] 0.6785538
## 
## $plot
## NULL

library(dplyr)
library(factoextra)
WARD <- NLB_CluStd %>%
  get_dist(method = "euclidean") %>%
  hclust(method = "ward.D2")

WARD

## 
## Call:
## hclust(d = ., method = "ward.D2")
## 
## Cluster method   : ward.D2 
## Distance         : euclidean 
## Number of objects: 304

library(factoextra)
fviz_dend(WARD,
          k=3,
          cex = 0.5,
          palette = "jama",
          color_labels_by_k = TRUE,
          rect = TRUE)

## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## ℹ The deprecated feature was likely used in the factoextra package.
##   Please report the issue at <https://github.com/kassambara/factoextra/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

library(factoextra)
library(NbClust)
fviz_nbclust(NLB_CluStd, kmeans, method = "wss") +
  labs(subtitle = "Elbow Method")

fviz_nbclust(NLB_CluStd, kmeans, method = "silhouette") +
  labs(subtitle = "Silhouette analysis")

Clustering <- kmeans(NLB_CluStd,
                     centers = 4,
                     nstart = 25)
Clustering

## K-means clustering with 4 clusters of sizes 63, 93, 103, 45
## 
## Cluster means:
##     Security Ease of use Availability       Speed    Privacy Spending Control
## 1  0.4866563   0.4709095  -0.15616090  0.24994550  0.8603929       -0.8773218
## 2 -0.7517767   0.7902180  -0.57596893  0.69484422 -0.8043929        0.5618051
## 3  0.1430845  -0.5409715   0.04180835 -0.07174967 -0.0235824        0.1264165
## 4  0.5448485  -1.0541669   1.31326640 -1.62170805  0.5118394       -0.2221667
##   Dissimilarity
## 1      2.469170
## 2      2.324214
## 3      1.698921
## 4      3.443997
## 
## Clustering vector:
##   [1] 2 1 2 3 2 2 3 3 3 3 2 3 1 3 2 2 1 3 3 2 1 2 2 4 3 3 3 2 2 4 1 4 3 3 3 2 2
##  [38] 4 2 2 1 2 4 3 3 4 1 3 3 2 4 3 2 2 1 3 2 2 1 3 2 4 1 1 1 1 2 3 3 2 4 1 3 2
##  [75] 2 1 2 1 3 3 1 4 3 1 1 3 3 3 2 3 4 2 1 1 2 2 3 2 4 1 1 2 2 2 3 4 1 2 2 1 3
## [112] 1 1 1 2 3 1 4 2 3 2 1 4 4 1 2 3 2 2 3 1 1 4 2 3 3 4 1 1 4 3 3 2 2 2 4 2 2
## [149] 1 3 2 3 3 1 3 3 2 3 3 2 2 3 2 4 3 3 2 1 4 4 4 2 4 1 2 4 1 1 3 2 3 3 2 3 1
## [186] 3 3 3 4 1 4 3 2 3 4 3 3 3 1 3 1 4 1 4 2 4 4 4 4 3 3 3 2 1 2 3 1 3 2 3 1 3
## [223] 3 3 3 4 2 2 4 2 2 1 2 3 3 2 2 2 2 3 2 3 1 1 3 3 2 2 2 2 2 2 2 4 2 1 3 3 1
## [260] 1 1 3 2 2 3 2 3 3 3 3 2 2 3 3 3 3 3 3 3 3 3 3 2 1 1 1 1 4 1 4 4 2 2 1 2 3
## [297] 3 4 1 4 3 4 2 4
## 
## Within cluster sum of squares by cluster:
## [1] 301.0014 275.2221 329.2284 299.6181
##  (between_SS / total_SS =  40.2 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"

library(factoextra)

# Define the NLB colors
NLB_colors <- c("#230078", "#84BD00", "#FA7800", "#63666A", "#A7A8AA", "black", "orange")

fviz_cluster(Clustering, 
             palette = NLB_colors, # Use NLB colors for clusters
             repel = FALSE,
             ggtheme = theme_bw(), # Black and white theme
             data = NLB_CluStd)

Averages <- Clustering$centers
Averages

##     Security Ease of use Availability       Speed    Privacy Spending Control
## 1  0.4866563   0.4709095  -0.15616090  0.24994550  0.8603929       -0.8773218
## 2 -0.7517767   0.7902180  -0.57596893  0.69484422 -0.8043929        0.5618051
## 3  0.1430845  -0.5409715   0.04180835 -0.07174967 -0.0235824        0.1264165
## 4  0.5448485  -1.0541669   1.31326640 -1.62170805  0.5118394       -0.2221667
##   Dissimilarity
## 1      2.469170
## 2      2.324214
## 3      1.698921
## 4      3.443997

Figure <- as.data.frame(Averages)
Figure$id <- 1:nrow(Figure)

library(tidyr)

Figure <- pivot_longer(Figure, cols = c("Security", "Ease of use", "Availability", "Speed", "Privacy", "Spending Control"))

Figure$Group <- factor(Figure$id, 
                       levels = c(1, 2, 3, 4, 5), 
                       labels = c("1", "2", "3", "4", "5"))

Figure$ImeF <- factor(Figure$name, 
              levels = c("Security", "Ease of use", "Availability", "Speed", "Privacy", "Spending Control"), 
              labels = c("Security", "Ease of use", "Availability", "Speed", "Privacy", "Spending Control"))


library(ggplot2)
ggplot(Figure, aes(x = ImeF, y = value)) +
  geom_hline(yintercept = 0) +
  theme_bw() +
  geom_point(aes(shape = Group, col = Group), size = 3) +
  geom_line(aes(group = id), linewidth = 1) +
  ylab("Averages") +
  xlab("Cluster variables") +
  scale_color_manual(values = NLB_colors) +  # Use NLB colors for points and lines
  ylim(-3, 3) +
  theme(axis.text.x = element_text(angle = 45, vjust = 0.50, size = 10))

NLB_CluStd$Group <- Clustering$cluster

fit <- aov(cbind(`Security`, `Ease of use`, `Availability`, `Speed`, `Privacy`, `Spending Control`) ~ as.factor(Group), 
           data = NLB_CluStd)

summary(fit)

##  Response Security :
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## as.factor(Group)   3  82.949 27.6495  37.695 < 2.2e-16 ***
## Residuals        300 220.051  0.7335                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response Ease of use :
##                   Df Sum Sq Mean Sq F value    Pr(>F)    
## as.factor(Group)   3 152.19  50.731  100.92 < 2.2e-16 ***
## Residuals        300 150.81   0.503                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response Availability :
##                   Df Sum Sq Mean Sq F value    Pr(>F)    
## as.factor(Group)   3 110.18  36.726   57.14 < 2.2e-16 ***
## Residuals        300 192.82   0.643                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response Speed :
##                   Df Sum Sq Mean Sq F value    Pr(>F)    
## as.factor(Group)   3 167.71  55.905  123.97 < 2.2e-16 ***
## Residuals        300 135.29   0.451                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response Privacy :
##                   Df Sum Sq Mean Sq F value    Pr(>F)    
## as.factor(Group)   3 118.66  39.553   64.37 < 2.2e-16 ***
## Residuals        300 184.34   0.614                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response Spending Control :
##                   Df  Sum Sq Mean Sq F value    Pr(>F)    
## as.factor(Group)   3  81.711 27.2370  36.925 < 2.2e-16 ***
## Residuals        300 221.289  0.7376                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

8 Criterion validity (significant descriptors)

NLB_data$Group <- NLB_CluStd$Group

8.1 Frequency of Mobile Payment Usage per Month

NLB_data$Phone_Use_merged <- ifelse(
  NLB_data$Phone_Use == 1, "never",
  ifelse(NLB_data$Phone_Use %in% c(2, 3), "irregular basis", "regular basis")
)

# Convert the merged column to a factor with levels in the desired order
NLB_data$Phone_Use_merged <- factor(
  NLB_data$Phone_Use_merged,
  levels = c("never", "irregular basis", "regular basis")
)

chi_square <- chisq.test(NLB_data$Phone_Use_merged, as.factor(NLB_data$Group))
chi_square

## 
##  Pearson's Chi-squared test
## 
## data:  NLB_data$Phone_Use_merged and as.factor(NLB_data$Group)
## X-squared = 19.394, df = 6, p-value = 0.003547

8.2 Frequency of Cash Usage for Purchases Up to 10 EUR

NLB_data$Cash_Up10_merged <- ifelse(
  NLB_data$Cash_Up10 %in% c(1, 2), "less than half",
  ifelse(NLB_data$Cash_Up10 == 3, "half", "more than half")
)

# Convert the merged column to a factor with levels in the desired order
NLB_data$Cash_Up10_merged <- factor(
  NLB_data$Cash_Up10_merged,
  levels = c("less than half", "half", "more than half")
)

chi_square <- chisq.test(NLB_data$Cash_Up10_merged, as.factor(NLB_data$Group))
chi_square

## 
##  Pearson's Chi-squared test
## 
## data:  NLB_data$Cash_Up10_merged and as.factor(NLB_data$Group)
## X-squared = 20.107, df = 6, p-value = 0.00265

9 Demographics

9.1 Demographics - significant :)

9.1.1 Frequency of Mobile Payment Usage per Month

# Calculate frequency by Mobile payment usage
Phone_freq <- NLB_data %>%
  group_by(Group, Phone_Use_merged) %>%
  summarise(Count = n(), .groups = 'drop') %>% 
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by Mobile payment usage
ggplot(Phone_freq, aes(x = Group, y = Percentage, fill = Phone_Use_merged)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Mobile Payment Usage Across Youngs (18-27 y.o.)",
    x = "Group",
    y = "Percentage(%)",
    fill = "Frequency of Mobile Payment Usage"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Use NLB colors for the fill
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Assuming Phone_freq is already calculated, reshape it
Phone_wide <- Phone_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  spread(key = Phone_Use_merged, value = Percentage)  # Spread data across columns

# Create and style the wide format table with borders and grey title row
Phone_wide %>%
  kable(caption = "Mobile Payment Usage Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Mobile Payment Usage Distribution by Group (in %)
Group	never	irregular basis	regular basis
1	38.1	14.3	47.6
2	18.3	17.2	64.5
3	22.3	8.7	68.9
4	37.8	22.2	40.0

9.1.2 Frequency of Cash Usage for Purchases Up to 10 EUR

# Calculate frequency by cash usage for purchases up to 10 EUR
Purchases_freq <- NLB_data %>%
  group_by(Group, Cash_Up10_merged) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by cash usage for purchases up to 10 EUR
ggplot(Purchases_freq, aes(x = Group, y = Percentage, fill = Cash_Up10_merged)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Cash Payment for Purchases up to 10 EUR Among Youngs (18-27 y.o.)",
    x = "Group",
    y = "Percentage(%)",
    fill = "Number of Cash Payments (Up to 10 EUR)"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Use NLB colors for the fill
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Assuming Purchases_freq is already calculated, reshape it
Purchases_wide <- Purchases_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  spread(key = Cash_Up10_merged, value = Percentage)  # Spread data across columns

# Create and style the wide format table with borders and grey title row
Purchases_wide %>%
  kable(caption = "Cash Usage for Purchases Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Cash Usage for Purchases Distribution by Group (in %)
Group	less than half	half	more than half
1	65.1	23.8	11.1
2	72.0	8.6	19.4
3	68.9	23.3	7.8
4	53.3	17.8	28.9

9.2 Demographics - not significant :(

9.2.1 Frequency of Card Payment Usage per Month

NLB_data$Card_Use_merged <- ifelse(
  NLB_data$Card_Use == 1, "never",
  ifelse(NLB_data$Card_Use %in% c(2, 3), "irregular basis", "regular basis"))

# Convert the merged column to a factor with levels in the desired order
NLB_data$Card_Use_merged <- factor(
  NLB_data$Card_Use_merged,
  levels = c("never", "irregular basis", "regular basis"))

# Calculate frequency by Mobile payment usage
Card_freq <- NLB_data %>%
  group_by(Group, Card_Use_merged) %>%
  summarise(Count = n(), .groups = 'drop') %>% 
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by Mobile payment usage
ggplot(Card_freq, aes(x = Group, y = Percentage, fill = Card_Use_merged)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Card Payment Usage Across Youngs (18-27 y.o.)",
    x = "Group",
    y = "Percentage(%)",
    fill = "Frequency of Card Payment Usage"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Use NLB colors for the fill
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Assuming Purchases_freq is already calculated, reshape it
Purchases_wide <- Card_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  spread(key = Card_Use_merged, value = Percentage)  # Spread data across columns

# Create and style the wide format table with borders and grey title row
Purchases_wide %>%
  kable(caption = "Card Usage for Purchases Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Card Usage for Purchases Distribution by Group (in %)
Group	never	irregular basis	regular basis
1	6.3	28.6	65.1
2	10.8	28.0	61.3
3	4.9	32.0	63.1
4	6.7	26.7	66.7

9.2.2 Frequency of Other Payment (PayPal, Stripe, etc.) Usage per Month

NLB_data$OtherPay_Use_merged <- ifelse(
  NLB_data$OtherPay_Use == 1, "never",
  ifelse(NLB_data$OtherPay_Use %in% c(2, 3), "irregular basis", "regular basis"))

# Convert the merged column to a factor with levels in the desired order
NLB_data$OtherPay_Use_merged <- factor(
  NLB_data$OtherPay_Use_merged,
  levels = c("never", "irregular basis", "regular basis"))

# Calculate frequency by Mobile payment usage
OtherPay_freq <- NLB_data %>%
  group_by(Group, OtherPay_Use_merged) %>%
  summarise(Count = n(), .groups = 'drop') %>% 
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by Mobile payment usage
ggplot(OtherPay_freq, aes(x = Group, y = Percentage, fill = OtherPay_Use_merged)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Other Payment (PayPal, Stripe, etc.) Usage Across Youngs (18-27 y.o.)",
    x = "Group",
    y = "Percentage(%)",
    fill = "Frequency of Other Payment Usage"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Use NLB colors for the fill
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Assuming OtherPay_freq is already calculated, reshape it
OtherPay_wide <- OtherPay_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  spread(key = OtherPay_Use_merged, value = Percentage)  # Spread data across columns

# Create and style the wide format table with borders and grey title row
OtherPay_wide %>%
  kable(caption = "Other Payment Methods Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Other Payment Methods Distribution by Group (in %)
Group	never	irregular basis	regular basis
1	73.0	25.4	1.6
2	57.0	38.7	4.3
3	50.5	42.7	6.8
4	64.4	33.3	2.2

9.2.3 Frequency of Cash Usage for Purchases 11-99 EUR

# Merge cash usage for purchases between 11-99 EUR
NLB_data$Cash_11_99_merged <- ifelse(
  NLB_data$Cash_11_99 %in% c(1, 2), "less than half",
  ifelse(NLB_data$Cash_11_99 == 3, "half", "more than half")
)

# Convert the merged column to a factor with levels in the desired order
NLB_data$Cash_11_99_merged <- factor(
  NLB_data$Cash_11_99_merged,
  levels = c("less than half", "half", "more than half")
)

# Calculate frequency by cash usage for purchases between 11-99 EUR
Purchases1199_freq <- NLB_data %>%
  group_by(Group, Cash_11_99_merged) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by cash usage for purchases between 11-99 EUR
ggplot(Purchases1199_freq, aes(x = Group, y = Percentage, fill = Cash_11_99_merged)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Cash Payment for Purchases Between 11-99 EUR Among Youngs (18-27 y.o.)",
    x = "Group",
    y = "Percentage(%)",
    fill = "Number of Cash Payments (11-99 EUR)"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Use NLB colors for the fill
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Reshape Purchases1199_freq to a wide format using pivot_wider()
Purchases_wide <- Purchases1199_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(names_from = Cash_11_99_merged, values_from = Percentage)  # Reshape data

# Create and style the wide format table with borders and grey title row
Purchases_wide %>%
  kable(caption = "Cash Usage for Purchases (11-99 EUR) Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Cash Usage for Purchases (11-99 EUR) Distribution by Group (in %)
Group	less than half	half	more than half
1	79.4	15.9	4.8
2	76.3	12.9	10.8
3	87.4	6.8	5.8
4	66.7	17.8	15.6

9.2.4 Frequency of Cash Usage for Purchases 100-1000 EUR

# Merge cash usage for purchases over 1000 EUR
NLB_data$Cash_Over1000_merged <- ifelse(
  NLB_data$Cash_Over1000 %in% c(1, 2), "less than half",
  ifelse(NLB_data$Cash_Over1000 == 3, "half", "more than half")
)

# Convert the merged column to a factor with levels in the desired order
NLB_data$Cash_Over1000_merged <- factor(
  NLB_data$Cash_Over1000_merged,
  levels = c("less than half", "half", "more than half")
)

# Calculate frequency by cash usage for purchases over 1000 EUR
PurchasesOver1000_freq <- NLB_data %>%
  group_by(Group, Cash_Over1000_merged) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by cash usage for purchases over 1000 EUR
ggplot(PurchasesOver1000_freq, aes(x = Group, y = Percentage, fill = Cash_Over1000_merged)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Cash Payment for Purchases Over 1000 EUR Among Youngs (18-27 y.o.)",
    x = "Group",
    y = "Percentage(%)",
    fill = "Number of Cash Payments (Over 1000 EUR)"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Use NLB colors for the fill
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Reshape PurchasesOver1000_freq to a wide format using pivot_wider()
Purchases_wide <- PurchasesOver1000_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(names_from = Cash_Over1000_merged, values_from = Percentage)  # Reshape data

# Create and style the wide format table with borders and grey title row
Purchases_wide %>%
  kable(caption = "Cash Usage for Purchases ( Over 1000 EUR) Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Cash Usage for Purchases ( Over 1000 EUR) Distribution by Group (in %)
Group	less than half	half	more than half
1	87.3	1.6	11.1
2	92.5	3.2	4.3
3	92.2	1.0	6.8
4	82.2	6.7	11.1

9.2.5 Age

# Calculate percentage by Group
Age_freq <- NLB_data %>%
  group_by(Group, AgeF) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

NLB_colors2 <- c("#230078",    # Deep Blue
                 "#3A1A8B",    # Purple-blue
                 "#4D33B1",    # Lighter purple-blue
                 "#5F47D8",    # Soft lavender
                 "#84BD00",    # Green (from the original)
                 "#A7C700",    # Soft green
                 "#98FA00",    # Light green
                 "#FA7800",    # Orange (from the original)
                 "#FF9A33",    # Lighter orange
                 "#FFB266")    # Light peachy-orange


# Plot the percentage by Response
ggplot(Age_freq, aes(x = Group, y = Percentage, fill = as.factor(AgeF))) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Age by Youngs' Clusters (18-27 y.o.)",
    x = "Group",
    y = "Percentage (%)",
    fill = "Age"
  ) +
  scale_fill_manual(values = NLB_colors2) +  # Apply the new homogeneous color palette
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Assuming Age_freq is already calculated, reshape it
Age_wide <- Age_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(
    names_from = AgeF,
    values_from = Percentage,
    values_fill = list(Percentage = 0)  # Replace NA with 0
  )  # Reshape data

# Create and style the wide format table with borders and grey title row
Age_wide %>%
  kable(caption = "Age Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Age Distribution by Group (in %)
Group	18	19	20	21	22	23	24	25	26	27
1	4.8	3.2	14.3	9.5	12.7	23.8	14.3	9.5	6.3	1.6
2	1.1	7.5	8.6	8.6	15.1	24.7	8.6	20.4	2.2	3.2
3	4.9	9.7	12.6	7.8	13.6	15.5	10.7	13.6	4.9	6.8
4	13.3	4.4	22.2	2.2	17.8	22.2	4.4	8.9	4.4	0.0

# Convert AgeF from factor to numeric
NLB_data$AgeF <- as.numeric(as.character(NLB_data$AgeF))

# Now aggregate and calculate the mean
Age_means <- aggregate(AgeF ~ Group, data = NLB_data, FUN = mean, na.rm = TRUE)

# Print results
print(Age_means)

##   Group     AgeF
## 1     1 22.47619
## 2     2 22.75269
## 3     3 22.49515
## 4     4 21.62222

9.2.6 Income

# Calculate frequency by Income (Income_LevelF) and Group
income_freq <- NLB_data %>%
  group_by(Group, Income_LevelF) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by Income
ggplot(income_freq, aes(x = Group, y = Percentage, fill = Income_LevelF)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Income among Youngs (18-27 y.o.)",
    x = "Group",
    y = "Frequency",
    fill = "Income"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Apply the previous color palette
  theme_minimal()  # Using minimal theme

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Reshape the data using pivot_wider and remove NA column
income_wide <- income_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(names_from = Income_LevelF, values_from = Percentage) %>%
  select(-`NA`)  # Remove the column with NA values (if it exists)

# Create and style the wide format table with borders and grey title row
income_wide %>%
  kable(caption = "Income Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Income Distribution by Group (in %)
Group	0-200 EUR	201-500 EUR	501-800 EUR	801-1300 EUR	Above 1300 EUR
1	12.7	41.3	15.9	15.9	14.3
2	17.2	25.8	20.4	17.2	19.4
3	12.6	29.1	22.3	13.6	21.4
4	24.4	24.4	13.3	17.8	17.8

# Create a new column with calculated midpoints of Income Level
NLB_data <- NLB_data %>%
  mutate(Income_Mid = case_when(
    Income_LevelF == "0-200 EUR" ~ (0 + 200) / 2,
    Income_LevelF == "201-500 EUR" ~ (201 + 500) / 2,
    Income_LevelF == "501-800 EUR" ~ (501 + 800) / 2,
    Income_LevelF == "801-1300 EUR" ~ (800 + 1300) / 2,
    Income_LevelF == "Above 1300 EUR" ~ (1300 + 1700) / 2,  # Assuming 1300-1700 as range
    TRUE ~ NA_real_  # Assign NA for any unexpected values
  ))

# Print first rows to verify transformation
print(head(NLB_data[, c("Income_LevelF", "Income_Mid")]))

## # A tibble: 6 × 2
##   Income_LevelF Income_Mid
##   <fct>              <dbl>
## 1 201-500 EUR         350.
## 2 201-500 EUR         350.
## 3 501-800 EUR         650.
## 4 201-500 EUR         350.
## 5 801-1300 EUR       1050 
## 6 201-500 EUR         350.

# Compute mean income for each cluster
Income_means <- aggregate(Income_Mid ~ Group, data = NLB_data, FUN = mean, na.rm = TRUE)

# Print results
print(Income_means)

##   Group Income_Mid
## 1     1   641.5556
## 2     2   711.5215
## 3     3   730.1618
## 4     4   664.9659

9.2.7 Employment status

# Calculate frequency by Employment status (Status_Employment) and Group
status_freq <- NLB_data %>%
  group_by(Group, Status_EmploymentF) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by status
ggplot(status_freq, aes(x = Group, y = Percentage, fill = Status_EmploymentF)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Employment among Youngs (18-27 y.o.)",
    x = "Group",
    y = "Frequency",
    fill = "Employment status"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Apply the NLB colors palette
  theme_minimal()  # Keep the minimal theme

# Print results
# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Reshape the data using pivot_wider
status_wide <- status_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(
    names_from = Status_EmploymentF,
    values_from = Percentage,
    values_fill = list(Percentage = 0)  # Replace NA with 0
  )  # Reshape data to wide format

# Create and style the wide format table with borders and grey title row
status_wide %>%
  kable(caption = "Employment Status Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Employment Status Distribution by Group (in %)
Group	Student	Employed	Self-employed	Other	Unemployed
1	82.5	14.3	1.6	1.6	0.0
2	82.8	15.1	2.2	0.0	0.0
3	72.8	17.5	3.9	2.9	2.9
4	71.1	17.8	0.0	11.1	0.0

# Reclassify Employment Status
NLB_data <- NLB_data %>%
  mutate(Status_EmploymentF = case_when(
    Status_EmploymentF == "Employed" ~ "With Job",
    Status_EmploymentF == "Self-employed" ~ "With Job",
    TRUE ~ Status_EmploymentF  # Keep other categories unchanged
  ))

# Recalculate the frequency by employment status and group
status_freq <- NLB_data %>%
  group_by(Group, Status_EmploymentF) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Print only the percentages for "Student" and "With Job"

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Filter and reshape the data for "Student" and "With Job"
status_wide_filtered <- status_freq %>%
  filter(Status_EmploymentF %in% c("Student", "With Job")) %>%  # Filter specific statuses
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(names_from = Status_EmploymentF, values_from = Percentage)  # Reshape data to wide format

# Create and style the wide format table
status_wide_filtered %>%
  kable(caption = "Percentage of Students and Individuals With Jobs by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Percentage of Students and Individuals With Jobs by Group (in %)
Group	Student	With Job
1	82.5	15.9
2	82.8	17.2
3	72.8	21.4
4	71.1	17.8

9.2.8 Education

# Calculate frequency by Education (Q24F) and Group
education_freq <- NLB_data %>%
  group_by(Group, EducationF) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by Education
ggplot(education_freq, aes(x = Group, y = Percentage, fill = EducationF)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Education level among Youngs (18-27 y.o.)",
    x = "Group",
    y = "Frequency",
    fill = "Education level"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Apply the NLB colors palette
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Load required libraries
library(dplyr)
library(tidyr)
library(knitr)
library(kableExtra)

# Reshape the data using pivot_wider
education_wide <- education_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(
    names_from = EducationF,
    values_from = Percentage,
    values_fill = list(Percentage = 0)  # Replace NA with 0
  )  # Reshape data to wide format

# Create and style the wide format table
education_wide %>%
  kable(caption = "Education Level Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Education Level Distribution by Group (in %)
Group	High School	Bachelor Degree	Master Degree	PhD	Primary school	Vocational education
1	46.0	41.3	12.7	0.0	0.0	0.0
2	44.1	32.3	21.5	2.2	0.0	0.0
3	39.8	31.1	24.3	0.0	2.9	1.9
4	46.7	24.4	17.8	0.0	11.1	0.0

9.2.9 Response

# Calculate frequency by Response
response_freq <- NLB_data %>%
  group_by(Group, NoDigital_ResponseF) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by Response
ggplot(response_freq, aes(x = Group, y = Percentage, fill = NoDigital_ResponseF)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Responses to Merchants Not Accepting Digital Payments - Youngs (18-27 y.o.)",
    x = "Group",
    y = "Frequency",
    fill = "Response"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Apply the NLB colors palette
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Reshape the data using pivot_wider and replace NA with 0
response_wide <- response_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(
    names_from = NoDigital_ResponseF,
    values_from = Percentage,
    values_fill = list(Percentage = 0)  # Replace NA with 0
  )

# Create and style the wide format table
response_wide %>%
  kable(caption = "Responses to Merchants Not Accepting Digital Payments by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Responses to Merchants Not Accepting Digital Payments by Group (in %)
Group	Pay digital elsewhere	Pay as available	Never occurred	Other
1	14.3	63.5	12.7	9.5
2	22.6	64.5	7.5	5.4
3	15.5	62.1	18.4	3.9
4	13.3	68.9	17.8	0.0

9.2.10 Banks

# Calculate frequency by Banks
Banks_freq <- NLB_data %>%
  group_by(Group, Primary_BankF) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by Bank
ggplot(Banks_freq, aes(x = Group, y = Percentage, fill = Primary_BankF)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Primary Banks used by Youngs (18-27 y.o.)",
    x = "Group",
    y = "Frequency",
    fill = "Bank"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Apply the NLB colors palette
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Reshape the data using pivot_wider and replace NA with 0
banks_wide <- Banks_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(
    names_from = Primary_BankF,
    values_from = Percentage,
    values_fill = list(Percentage = 0)  # Replace NA with 0
  )

# Create and style the wide format table
banks_wide %>%
  kable(caption = "Distribution of Primary Banks Used by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Distribution of Primary Banks Used by Group (in %)
Group	NLB	OTP	Intesa Sanpaolo	Addiko Bank	Delavska Hranilnica	Other	Sparkasse
1	39.7	36.5	9.5	1.6	3.2	9.5	0.0
2	31.2	36.6	7.5	2.2	5.4	11.8	5.4
3	48.5	34.0	2.9	2.9	7.8	2.9	1.0
4	37.8	37.8	2.2	4.4	8.9	6.7	2.2

9.2.11 Gender

# Calculate frequency by Gender
Gender_freq <- NLB_data %>%
  group_by(Group, GenderF) %>%
  summarise(Count = n(), .groups = 'drop') %>%
  group_by(Group) %>%
  mutate(Percentage = Count / sum(Count) * 100)

# Plot the frequency by Bank
ggplot(Gender_freq, aes(x = Group, y = Percentage, fill = GenderF)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(
    title = "Distribution of Gender among Youngs (18-27 y.o.)",
    x = "Group",
    y = "Frequency",
    fill = "Gender"
  ) +
  scale_fill_manual(values = NLB_colors) +  # Apply the NLB colors palette
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels if needed

# Reshape the data using pivot_wider and replace NA with 0
gender_wide <- Gender_freq %>%
  select(-Count) %>%  # Remove the Count column (optional)
  pivot_wider(
    names_from = GenderF,
    values_from = Percentage,
    values_fill = list(Percentage = 0)  # Replace NA with 0
  )

# Create and style the wide format table
gender_wide %>%
  kable(caption = "Gender Distribution by Group (in %)", digits = 1) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"), full_width = F) %>%
  row_spec(0, background = "lightgrey", bold = TRUE, color = "black") %>%  # Style header row
  column_spec(1, bold = TRUE) %>%  # Make the first column bold (Group column)
  kable_styling(bootstrap_options = "bordered")  # Add borders around the table

Gender Distribution by Group (in %)
Group	Man	Woman	Other	Don’t want to answer
1	33.3	65.1	1.6	0.0
2	40.9	58.1	0.0	1.1
3	40.8	59.2	0.0	0.0
4	42.2	57.8	0.0	0.0

10 Cluster description

10.1 Cluster 1: Security-Conscious Users

Group demographics: The average age in this group is 22.5 years. The gender distribution is slightly unbalanced, with 65.1% women and 33.3% men. Most individuals have completed high school education (46%), followed by bachelor degree (41.3 %). The little majority use NLB as their primary bank (39.7%), with an average income of 641.55 EUR.

Group behaviors: regarding mobile payments, 47.6% of people in this group use mobile payment methods regularly, and 65.1% of payments are made using card.
Cash payments are used for more than half small purchases (up to 10 EUR) in only 11.1% of cases, for medium purchases (11–99 EUR) in 4.8% of cases, and for large purchases (100-1000 EUR) in 11.1% of cases.

Key Characteristics:
- Security: This group places significant importance on security (mean: 0.86), reflecting their perception that traditional methods like cash and cards provide safer payment options. Their comfort with traditional methods is likely influenced by their strong sense of security, which aligns with the high mean for security on cards and cash in comparison to mobile payments or neobanks.
- Digital Payments: They exhibit lower usage of mobile payments (38.1% never use), preferring more established and perceived secure methods such as cards (mean: 0.65 for “Ease of Use” of cards, regularly used by 65.1%). Neobanks and mobile wallets are less favored in comparison, likely due to concerns over the security (mean: 0.86) and privacy (mean: 0.51) of newer digital platforms.
- Bank Preferences: Their choice of banks is highly local, with NLB (39.7%) and OTP (36.5%) being the most frequently used. This reflects their reliance on well-known and trusted financial institutions that provide traditional services and emphasize security.

These individuals prioritize security and privacy, particularly when it comes to transactions involving larger sums or unfamiliar methods. While they are open to digital payments, they remain cautious and conservative in their payment preferences, sticking to well-known and reliable forms like cash and debit cards. This group’s higher reliance on traditional payment methods may reflect broader generational trust issues with new technologies, particularly with respect to data privacy.

10.2 Cluster 2: Tech-Savvy Enthusiasts

Group demographics: The average age in this group is 22.8 years. The gender distribution is slightly unbalanced, with 58.1% women and 40.9% men. Most individuals have completed high school education (44.1%), followed by bachelor degree (32.3%), and 21.5% have a master’s degree. The primary banks used by this group are OTP (36.6%) and NLB (31.2%), with an average income of 711.52 EUR.

Group behaviors: Regarding mobile payments, 64.5% of people in this group use mobile payment methods regularly, and 61.3% of payments are made using card.
Cash payments are used for more than half of small purchases (up to 10 EUR) in 19.4% of cases, for medium purchases (11–99 EUR) in 10.8% of cases, and for large purchases (100–1000 EUR) in 4.3% of cases.

Key Characteristics:
- Payment Method Preferences: This group shows the highest preference for mobile payments (64.5% use regularly), reflecting their openness to adopting new payment technologies. This is consistent with the high ease of use mean for mobile payments (mean: 0.79), as they perceive these methods as efficient and simple to use.
- Tech Enthusiasm: They value ease of use (mean: 0.79) and speed (mean: 0.69) of transactions, and are less concerned with the traditional norms surrounding payment methods. Their perception of neobanks (mean: 0.71 for Ease of Use) and mobile payments as fast, efficient, and modern speaks to their affinity for cutting-edge technologies.
- Bank Preferences: They utilize a more diverse set of banks, with OTP (36.6%) and NLB (31.2%) being the most common, but their preference for newer, tech-savvy banking solutions likely supports their high engagement with mobile wallets.

The Tech-Savvy Enthusiasts are the early adopters of digital innovations. They prioritize convenience, flexibility, and speed, which digital payment solutions like mobile payments and neobanks deliver. This group is more comfortable embracing new payment systems and frequently uses them to support their on-the-go, fast-paced lifestyles. Their high engagement with mobile payments (64.5% use regularly) showcases their affinity for technology and digital banking solutions.

10.3 Cluster 3: Pragmatic Minimalists

Group demographics: The average age in this group is 22.5 years. The gender distribution is slightly unbalanced, with 59.2% women and 40.8% men. Most individuals have completed high school education (39.8%), followed by bachelor degree (31.1%) and 24.3% have a master’s degree. The most common bank in this group is NLB (48.5%), with an average income of 730.2 EUR.

Group behaviors: Regarding mobile payments, 68.9% of people in this group use mobile payment methods regularly, and 63.1% of payments are made using card.
Cash payments are used for more than half of small purchases (up to 10 EUR) in 7.8% of cases, for medium purchases (11–99 EUR) in 5.8% of cases, and for large purchases (100–1000 EUR) in 6.8% of cases.

Key Characteristics:
- Payment Method Preferences: This group exhibits a pragmatic approach, preferring traditional payment methods like cash (68.9% use for small purchases, more than half for purchases under 10 EUR) while using cards regularly (63.1%). They have moderate ease of use perceptions for cards (mean: 0.47), reflecting their comfort with these methods, but also demonstrate an openness to newer solutions in a more moderate capacity.
- Efficiency: Their focus is on ease of use (mean: 0.47) and availability (mean: 0.04). This group values functionality over novelty, choosing payment methods that are reliable and widely accepted.
- Bank Preferences: Similar to the other clusters, they show a preference for local banks like NLB (48.5%) and OTP (34%), aligning with their practical approach to financial decisions. They are slightly less engaged with mobile payments than Tech-Savvy Enthusiasts, but they still use cards regularly.

The Pragmatic Minimalists are skeptical of excessive novelty but still open to using digital payment methods that are straightforward and well-accepted. They are not as enthusiastic as the Tech-Savvy Enthusiasts, but they prefer the familiarity and simplicity of physical cards and cash for transactions. This group likely seeks functionality and efficiency without unnecessary complexity.

10.4 Cluster 4: Skeptical Traditionalists

Group demographics: The average age in this group is 21.6 years, the youngest among all clusters. The gender distribution leans toward women (57.8%), with 42.2% men. Most individuals have completed high school education (46.7%), while 24.4% have a bachelor degree, and 17.8% have a master’s degree. The most popular banks in this group are OTP (37.8%) and NLB (37.8%), with an average income of 665.0 EUR.

Group behaviors: Regarding mobile payments, 40.0% of people in this group use mobile payment methods regularly, and 66.7% of payments are made using card.
Cash payments dominate, with small purchases (up to 10 EUR) made for more than half of purchases in 28.9% of cases, medium purchases (11–99 EUR) in 15.6% of cases, and large purchases (100–1000 EUR) in 11.1% of cases.

Key Characteristics:
- Payment Method Preferences: This group is the most cautious toward mobile and digital payments, with the highest percentage of respondents using cash for smaller purchases (53.3% use cash for purchases under 10 EUR). They are also less likely to use neobanks or mobile payments on a regular basis, with cash usage being their strongest preference (mean: -1.05 for Ease of Use for mobile payments).
- Traditional Preferences: They favor cash for purchases across all ranges, and are more likely to see cash as offering better control over spending (mean: -0.22 for Spending Control). Their perception of traditional payment methods is deeply rooted in trust and familiarity.
- Bank Preferences: Like the others, this group uses NLB and OTP, with a notable proportion using other local banks.

The Skeptical Traditionalists show resistance to change, preferring more familiar, traditional payment methods. Their inclination towards cash and reluctance toward mobile payments or neobanks likely stems from their concerns about security and privacy. They have a strong belief in having control over their spending (mean: -0.22 for Spending Control), which they feel is best achieved with tangible forms of payment like cash or debit cards.

11 Perception map

library(pastecs)

NLB_PCA <- NLB_data[ , c("Safe_Cash", "Safe_DebitCard", "Safe_CreditCard", "Safe_PhonePay", "Safe_Neobank", 
                         "Easy_Cash", "Easy_DebitCard", "Easy_CreditCard", "Easy_PhonePay", "Easy_Neobank", 
                         "Accept_Cash", "Accept_DebitCard", "Accept_CreditCard", "Accept_PhonePay", "Accept_Neobank", 
                         "Fast_Cash", "Fast_DebitCard", "Fast_CreditCard", "Fast_PhonePay", "Fast_Neobank", 
                         "Private_Cash", "Private_DebitCard", "Private_CreditCard", "Private_PhonePay", "Private_Neobank", 
                         "Control_Cash", "Control_DebitCard", "Control_CreditCard", "Control_PhonePay", "Control_Neobank")]

library(dplyr)

NLB_PCA <- NLB_PCA %>%
  rename(
    Cash_Security = Safe_Cash,
    Debit_Security = Safe_DebitCard,
    Credit_Security = Safe_CreditCard,
    Mobile_Security = Safe_PhonePay,
    NeoBanks_Security = Safe_Neobank,
    
    Cash_Ease = Easy_Cash,
    Debit_Ease = Easy_DebitCard,
    Credit_Ease = Easy_CreditCard,
    Mobile_Ease = Easy_PhonePay,
    NeoBanks_Ease = Easy_Neobank,
    
    Cash_Availability = Accept_Cash,
    Debit_Availability = Accept_DebitCard,
    Credit_Availability = Accept_CreditCard,
    Mobile_Availability = Accept_PhonePay,
    NeoBanks_Availability = Accept_Neobank,
    
    Cash_Speed = Fast_Cash,
    Debit_Speed = Fast_DebitCard,
    Credit_Speed = Fast_CreditCard,
    Mobile_Speed = Fast_PhonePay,
    NeoBanks_Speed = Fast_Neobank,
    
    Cash_Privacy = Private_Cash,
    Debit_Privacy = Private_DebitCard,
    Credit_Privacy = Private_CreditCard,
    Mobile_Privacy = Private_PhonePay,
    NeoBanks_Privacy = Private_Neobank,
    
    Cash_Control = Control_Cash,
    Debit_Control = Control_DebitCard,
    Credit_Control = Control_CreditCard,
    Mobile_Control = Control_PhonePay,
    NeoBanks_Control = Control_Neobank)

library(tibble)
perceptual <- NLB_PCA %>% 
  pivot_longer(everything(), names_to = "name", values_to = "score")  %>% 
  separate(name, into = c("Payment method", "Variable"), sep = "_")%>% 
  pivot_wider(names_from = Variable, values_from = score, values_fn = mean) %>%
  column_to_rownames(var = "Payment method")

print(perceptual)

##          Security     Ease Availability    Speed  Privacy  Control
## Cash     5.503289 5.101974     6.226974 4.075658 5.680921 4.851974
## Debit    5.138158 6.148026     6.483553 6.174342 4.457237 5.154605
## Credit   4.947368 6.019737     6.335526 6.121711 4.457237 5.006579
## Mobile   5.174342 6.480263     6.029605 6.526316 4.394737 5.161184
## NeoBanks 4.825658 5.592105     5.075658 5.684211 4.483553 5.115132

library(FactoMineR)
pca <- PCA(perceptual, 
           scale.unit = TRUE, 
           graph = FALSE,
           ncp = 2)

print(pca$var$cor)

##                   Dim.1        Dim.2
## Security     -0.7563320  0.534605714
## Ease          0.8659604  0.465478553
## Availability -0.1808968  0.942766082
## Speed         0.9753947  0.204881709
## Privacy      -0.9934258  0.026725632
## Control       0.9275822 -0.001610424

library(factoextra)
fviz_pca_biplot(pca, 
                repel = TRUE)

11.1 Interpretation of the PCA Biplot (Perception Map)

This Principal Component Analysis (PCA) biplot provides a visual representation of financial payment methods and how they relate to key perceptual factors such as security, privacy, speed, availability, control, and ease of use.

11.1.1 Axes Interpretation:

Dim1 (69.2%) and Dim2 (23.9%) together explain 93.1% of the variance, meaning most of the variation in the data can be understood from this two-dimensional representation.
Dim1 (X-axis) likely captures a contrast between traditional vs. modern payment methods, with Cash positioned far to the left and Mobile payments, Credit, and Debit cards toward the right.
Dim2 (Y-axis) may represent the trade-off between accessibility and digital convenience vs. security and privacy concerns.

11.1.2 Key Observations:

Cash vs. Digital Payments
- Cash is located far left, strongly associated with privacy but negatively correlated with factors like speed, control, and availability.
- Credit, Debit, and Mobile Payments are clustered together on the right, indicating that they share common characteristics, particularly ease of use, speed, and control.
NeoBanks (Fintech-Driven Banking)
- NeoBanks are positioned lower in Dim2, indicating a potential perception of risk or lack of trust compared to traditional banking options.
- They may also be less associated with security and privacy, as they are further away from these vectors.
Direction and Meaning of Vectors (Arrows):
- Security and Privacy point toward the upper left, aligning with Cash, indicating that users who prioritize these factors tend to prefer cash transactions.
- Speed, Ease of Use, and Control point toward the right, aligning with Mobile and Digital payment methods, suggesting that these are perceived as efficient and user-friendly.
- Availability is pointing upwards, which suggests that people associate availability with banking products that are easily accessible across multiple channels.

11.1.3 Conclusion:

This perception map illustrates the trade-offs in different payment methods:

Cash is viewed as the safest and most private option, but at the cost of convenience and speed.
Digital payment methods (Credit, Debit, Mobile) are associated with efficiency, ease of use, and control, but potentially at the expense of security and privacy.
NeoBanks are viewed as a distinct category, perhaps due to concerns over trust or unfamiliarity compared to traditional banking systems.

This visualization helps us understand consumer preferences when choosing payment methods and their perceived benefits and drawbacks.

12 Hypothesesis

12.1 Majority of young people use cash at least once a month.

Let’s check if assumptions to perform a parametric are met:

n * 𝜋> 5 -> 304 * 0.5 = 152 > 5

n(1-𝜋) > 5 -> 304 * 0.5 = 152 > 5

Both assumptions are met, we can proceed with the parametric test - test of population proportion

H0: π = 0.5

H1: π > 0.5

sum(NLB_data$Cash_Use > 1, na.rm = TRUE)

## [1] 281

prop.test(x = 281,
          n = 304,
          p = 0.5,
          correct = FALSE,
          alternative = "greater")

## 
##  1-sample proportions test without continuity correction
## 
## data:  281 out of 304, null probability 0.5
## X-squared = 218.96, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.8954807 1.0000000
## sample estimates:
##         p 
## 0.9243421

We reject H0 at p<0.001, the proportion of young people who still use cash at least once a month is larger than 50%.

12.2 Cash is primarily used by young people (18–27) when digital payment methods are unavailable.

To test this, we will do a test of population proportion. Let’s check if assumptions are met.

n * 𝜋> 5 -> 304 * 0.5 = 152 > 5

n(1-𝜋) > 5 -> 304 * 0.5 = 152 > 5

Assumptions are met, so we can proceed with the test of population proportion.

H0: 𝜋 = 0.5

H1: 𝜋 > 0.5

sum(NLB_data$Use_Cash_IfNoDigital > 5, na.rm = TRUE)

## [1] 200

prop.test(x = 200,
          n = 304,
          p = 0.5,
          correct = FALSE,
          alternative = "greater")

## 
##  1-sample proportions test without continuity correction
## 
## data:  200 out of 304, null probability 0.5
## X-squared = 30.316, df = 1, p-value = 1.836e-08
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.6119223 1.0000000
## sample estimates:
##         p 
## 0.6578947

We reject H0. The proportion of people who agree or completely agree that they use cash when digital payments are not available, is statistically larger than 50% (p<0.001).

12.3 Most young people use cash for small payments (up to 10 EUR).

To test this, we will do a test of population proportion. Let’s check if assumptions are met.

n * 𝜋> 5 -> 304 * 0.5 = 152 > 5

n(1-𝜋) > 5 -> 304 * 0.5 = 152 > 5

Assumptions are met, so we can proceed with the test of population proportion.

H0: 𝜋 = 0.5

H1: 𝜋 > 0.5

sum(NLB_data$Cash_Up10 > 1, na.rm = TRUE)

## [1] 270

prop.test(x = 270,
          n = 304,
          p = 0.5,
          correct = FALSE,
          alternative = "greater")

## 
##  1-sample proportions test without continuity correction
## 
## data:  270 out of 304, null probability 0.5
## X-squared = 183.21, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.8549349 1.0000000
## sample estimates:
##         p 
## 0.8881579

We reject H0 at p<0.001. We have found that more than 50% use cash for their small (less than 10 EUR payments) at least some of the time.

Let’s do the same test, to test how many people use cash for less than half of their small payments.

H0: 𝜋 = 0.5

H1: 𝜋 > 0.5

sum(NLB_data$Cash_Up10 == 2, na.rm = TRUE)

## [1] 169

prop.test(x = 169,
          n = 304,
          p = 0.5,
          correct = FALSE,
          alternative = "greater")

## 
##  1-sample proportions test without continuity correction
## 
## data:  169 out of 304, null probability 0.5
## X-squared = 3.8026, df = 1, p-value = 0.02559
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.5087589 1.0000000
## sample estimates:
##         p 
## 0.5559211

We reject H0 at p = 0.026. We have found that the proportion of people, who use cash for less than half of their their small payments is greater than 50%.

12.4 Young people who save, mainly save in cash.

NLB_data_save <- NLB_data %>% select(20)

NLB_data_save$Save_Form <-ifelse(test = NLB_data_save$Save_Form == -2,
                              yes = NA,
                              no = NLB_data_save$Save_Form)

library(tidyr)
NLB_data_save <- drop_na(NLB_data_save)

sum(NLB_data_save$Save_Form < 4, na.rm = TRUE)

## [1] 34

NLB_data_save$Save_FormF<- factor(NLB_data_save$Save_Form,
                       levels = c(1, 2, 3, 4, 5, 6, 7),
                       labels = c("Fully in cash", "Up to 25% in cash", "25% and 50% in cash", "About 50% in cash, 50% digital", "25% to 50% digital", "Up to 25% digital", "Fully digital"))

library(ggplot2)
library(dplyr)

# Calculate percentages
data_percent <- NLB_data_save %>%
  count(Save_FormF) %>%
  mutate(percentage = (n / sum(n)) * 100)

# Create the bar plot with NLB colors
ggplot(data_percent, aes(x = Save_FormF, y = percentage)) +
  geom_bar(stat = "identity", fill = NLB_colors[1], color = "black") +  # Apply NLB Indigo Blue color
  labs(title = "How do you save?", 
       x = "Preference", 
       y = "Percentage") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

We have 233 people who save in our sample. Based on the frequency graph and the calculations, we have found that in our sample, only 34 (14.6%) people who save, save more than 50% in cash. Based on this, we have decided to reject our hypothesis, that young people who save, mostly do so in cash.

12.5 Young people (18–27) mostly use the money in the same form they receive it.

To test this, we will do a test of population proportion. Let’s check if assumptions are met.

n * 𝜋> 5 -> 304 * 0.5 = 152 > 5

n(1-𝜋) > 5 -> 304 * 0.5 = 152 > 5

Assumptions are met, so we can proceed with the test of population proportion.

H0: 𝜋 = 0.5

H1: 𝜋 > 0.5

NLB_form <- NLB_data %>% select(21)

sum(NLB_form$Spend_SameForm > 4, na.rm = TRUE)

## [1] 207

prop.test(x = 207,
          n = 304,
          p = 0.5,
          correct = FALSE,
          alternative = "greater")

## 
##  1-sample proportions test without continuity correction
## 
## data:  207 out of 304, null probability 0.5
## X-squared = 39.803, df = 1, p-value = 1.405e-10
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.6355172 1.0000000
## sample estimates:
##         p 
## 0.6809211

We reject H0 at p<0.001. We have found that more than 50% of young people somewhat agree, agree or completely agree that they tend to use money in the same form as they receive it.

12.6 The convenience of digital payments is a more important factor influencing adoption among young people (18–27) than security.

library(dplyr)

NLB_R6H6 <- NLB_data %>% select(c(29, 32))

head(NLB_R6H6)

## # A tibble: 6 × 2
##   Importance_Ease Importance_Security
##             <dbl>               <dbl>
## 1               7                   6
## 2               6                   7
## 3               7                   6
## 4               7                   7
## 5               7                   7
## 6               5                   7

str(NLB_R6H6)

## tibble [304 × 2] (S3: tbl_df/tbl/data.frame)
##  $ Importance_Ease    : num [1:304] 7 6 7 7 7 5 6 6 4 7 ...
##  $ Importance_Security: num [1:304] 6 7 6 7 7 7 5 7 7 7 ...

NLB_R6H6 <- NLB_R6H6 %>% mutate_all(as.numeric)

str(NLB_R6H6)

## tibble [304 × 2] (S3: tbl_df/tbl/data.frame)
##  $ Importance_Ease    : num [1:304] 7 6 7 7 7 5 6 6 4 7 ...
##  $ Importance_Security: num [1:304] 6 7 6 7 7 7 5 7 7 7 ...

summary(NLB_R6H6)

##  Importance_Ease Importance_Security
##  Min.   :1.000   Min.   :1.000      
##  1st Qu.:6.000   1st Qu.:5.000      
##  Median :7.000   Median :7.000      
##  Mean   :6.105   Mean   :6.076      
##  3rd Qu.:7.000   3rd Qu.:7.000      
##  Max.   :7.000   Max.   :7.000

NLB_R6H6 <- na.omit(NLB_R6H6)

# Calculate the mean of column 29 (Enostavnost)
mean_enostavnost <- mean(NLB_R6H6[[1]], na.rm = TRUE)

print(mean_enostavnost)

## [1] 6.105263

# Calculate the mean of column 29 (Varnost)
mean_varnost <- mean(NLB_R6H6[[2]], na.rm = TRUE)

print(mean_varnost)

## [1] 6.075658

NLB_R6H6$diffs <- NLB_R6H6$Importance_Ease - NLB_R6H6$Importance_Security

shapiro_result <- shapiro.test(NLB_R6H6$diffs)

print(shapiro_result)

## 
##  Shapiro-Wilk normality test
## 
## data:  NLB_R6H6$diffs
## W = 0.8771, p-value = 6.971e-15

Since the p-value is much smaller than 0.05, we reject the null hypothesis that the differences are normally distributed.

This means the data does not follow a normal distribution, which justifies using a non-parametric test like the Wilcoxon Signed-Rank test instead of a paired t-test.

wilcox.test(
  NLB_R6H6[[1]], 
  NLB_R6H6[[2]], 
  paired = TRUE, 
  correct = FALSE, 
  exact = FALSE, 
  alternative = "two.sided")

## 
##  Wilcoxon signed rank test
## 
## data:  NLB_R6H6[[1]] and NLB_R6H6[[2]]
## V = 5908, p-value = 0.537
## alternative hypothesis: true location shift is not equal to 0

p-value = 0.537

Since the p-value is greater than 0.05, we fail to reject the null hypothesis.

This indicates that there is no statistically significant difference between the two variables (Enostavnost and Varnost).

In the context of our hypothesis, this means that we do not have enough evidence to say that convenience is a significantly stronger factor than security in influencing digital payment adoption among young people.

The Wilcoxon test results indicate that there is no statistically significant difference between the perceived importance of convenience and security (p = 0.537). This means that we fail to reject the null hypothesis, suggesting that convenience is not significantly more important than security in influencing digital payment adoption among young people (18-27 years old).

Therefore, our data does not support the hypothesis that convenience is a more important factor than security in digital payment adoption. Both factors appear to have similar levels of importance for young users.

12.7 Young people (18–27) mostly prefer a society that predominantly uses digital payments but remain reluctant to completely eliminate cash.

NLB_R4H4 <- NLB_data %>% select(85)

library(dplyr)

NLB_R4H4$Switch_Digital[NLB_R4H4$Switch_Digital == -1] <- 4

NLB_R4H4$Q22F <- factor(NLB_R4H4$Switch_Digital,
                       levels = c(1, 2, 3, 4),
                       labels = c("Fully digital", "Balance digital-cash", "Cash", "Don't know"))

library(ggplot2)
library(dplyr)

# Calculate percentages
data_percent <- NLB_R4H4 %>%
  count(Q22F) %>%
  mutate(percentage = (n / sum(n)) * 100)

# Create the bar plot with percentages
ggplot(data_percent, aes(x = Q22F, y = percentage)) +
  geom_bar(stat = "identity", fill = NLB_colors[1], color = "black") +  # Apply NLB Blue color
  labs(title = "Would you switch to a fully digital society?", 
       x = "Preference", 
       y = "Percentage") +
  theme_minimal()

head(data_percent)

## # A tibble: 4 × 3
##   Q22F                     n percentage
##   <fct>                <int>      <dbl>
## 1 Fully digital           78      25.7 
## 2 Balance digital-cash   171      56.2 
## 3 Cash                    44      14.5 
## 4 Don't know              11       3.62

From the graph, we can clearly see that the largest percentage of our respondents have answered that they would consider switching to a mostly digital society, but the do not want to fully give up cash.

To further test this, we will do a test of population proportion. Let’s check if assumptions are met.

n * 𝜋> 5 -> 304 * 0.5 = 152 > 5

n(1-𝜋) > 5 -> 304 * 0.5 = 152 > 5

Assumptions are met, so we can proceed with the test of population proportion.

H0: 𝜋 = 0.5

H1: 𝜋 > 0.5

prop.test(x = 171,
          n = 304,
          p = 0.5,
          correct = FALSE,
          alternative = "greater")

## 
##  1-sample proportions test without continuity correction
## 
## data:  171 out of 304, null probability 0.5
## X-squared = 4.75, df = 1, p-value = 0.01465
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.5153528 1.0000000
## sample estimates:
##      p 
## 0.5625

We reject H0 at p = 0.015. Population proportion of people who like digital but are not prepared to fully give up cash is larger than 50%.

12.8 Young people (18–27) are motivated to use digital payments when splitting bills with friends due to the simplicity of transferring exact amounts.

library(dplyr) 

NLB_R8H8 <- NLB_data %>% select(c(69))

head(NLB_R8H8)

## # A tibble: 6 × 1
##   Reason_ExactSum
##   <chr>          
## 1 1              
## 2 1              
## 3 1              
## 4 1              
## 5 1              
## 6 1

str(NLB_R8H8)

## tibble [304 × 1] (S3: tbl_df/tbl/data.frame)
##  $ Reason_ExactSum: chr [1:304] "1" "1" "1" "1" ...

col_name <- colnames(NLB_R8H8)[1] 

NLB_R8H8$Reason_ExactSum <- factor(NLB_R8H8$Reason_ExactSum,
                       levels = c(1, 0, -2),
                       labels = c("Transfer whole amounts", "Other reasons", "Not applicable"))



library(dplyr)

NLB_R8H8 %>% count (Reason_ExactSum)

## # A tibble: 3 × 2
##   Reason_ExactSum            n
##   <fct>                  <int>
## 1 Transfer whole amounts   236
## 2 Other reasons             28
## 3 Not applicable            40

To test this, we will do a test of population proportion. Let’s check if assumptions are met.

n * 𝜋> 5 -> 264 * 0.5 = 132 > 5

n(1-𝜋) > 5 -> 264 * 0.5 = 132 > 5

Assumptions are met, so we can proceed with the test of population proportion.

H0: 𝜋 = 0.5

H1: 𝜋 > 0.5

prop.test(x = 236,
          n = 264,
          p = 0.5,
          correct = FALSE,
          alternative = "greater")

## 
##  1-sample proportions test without continuity correction
## 
## data:  236 out of 264, null probability 0.5
## X-squared = 163.88, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is greater than 0.5
## 95 percent confidence interval:
##  0.8586738 1.0000000
## sample estimates:
##         p 
## 0.8939394

Since p-value < 0.05, we reject the null hypothesis H₀. This means there is strong statistical evidence that the true population proportion is greater than 50%.

13 Final conclusions based on the analysis

Monthly Cash Usage

Rationale: Despite the ongoing push toward digital payments, it was hypothesized that cash had not yet disappeared from young people’s routines. The literature suggests that cash remains a reserve method or is associated with specific purchasing habits (Puusniekka, 2020).
Outcome: Supported. A proportion test indicated that 92.4% of young respondents use cash at least once per month (p < 0.001), confirming that it continues to play a role in daily life.

Cash Usage Out of Necessity (Lack of Digital Alternatives)

Rationale: It was hypothesized that cash usage is primarily driven by the unavailability of digital payment options among merchants rather than a genuine preference, as young people generally favor the convenience of digital payments.
Outcome: Supported. 65.8% of respondents agreed that they use cash mainly when digital payments are not accepted (p < 0.001).

Small Payments (Under €10)

Rationale: Traditionally, small expenditures (e.g., snacks or beverages) have been dominated by cash due to perceptions of speed and control.
Outcome: Supported, with nuances. Although 88.8% of respondents use cash for these small purchases “at least occasionally,” further analysis revealed that 55.6% actually use cash for less than half of their small transactions, indicating increasing digital penetration even for micro-payments.

Savings in Physical Cash

Rationale: Based on the concept of “sacred money” and the desire to keep savings “protected from spending temptations,” it was hypothesized that young people would prefer physical cash savings (Puusniekka, 2020).
Outcome: Rejected. Only 14.6% of savers primarily keep cash. The majority of young respondents prefer to accumulate savings in digital form.

Consistency Between Income Form and Spending Form

Rationale: It was hypothesized that a form of behavioral inertia exists: if a young person receives money in cash (e.g., gifts or allowances), they tend to spend it in the same form rather than transferring it to a card.
Outcome: Supported. 68.1% of young respondents tend to spend money in the same form in which it is received (p < 0.001).

Convenience vs. Security

Rationale: Several studies on Generation Z suggest that speed and ease of use are stronger drivers than security (Demir et al., 2024).
Outcome: Rejected. A Wilcoxon test (p = 0.537) found no statistically significant difference: for young people, convenience and security are equally important in selecting a payment method.

“Cashless” Society, But Not Completely

Rationale: It was hypothesized that there is a psychological resistance to completely eliminating cash, which is perceived as an indispensable tool for control and privacy.
Outcome: Supported. 56.2% of respondents favor a predominantly digital society but are not willing to give up cash entirely (p = 0.015).

Splitting Expenses Among Friends

Rationale: It was hypothesized that the ability to transfer exact amounts via mobile apps (e.g., Flik or MobilePay) is the main reason for abandoning cash in social contexts.
Outcome: Supported. The vast majority of respondents use mobile apps to split expenses, citing the simplicity of transferring precise amounts as the primary driver of adoption in these social scenarios.

TAQR - Assignment 2

Christian Lasalvia

17th December 2025

1 Introduction

2 Literature review

3 Dataset overview

3.1 Variables description

4 Data manipulation

4.1 Factoring

5 Descriptive statistics

5.1 Numerical data

5.1.1 Summary of key insights - Numerical data

5.1.1.1 Trust and Security Preferences

5.1.1.2 Preferences for Cash and Digital Payments

5.1.1.3 Digital Payment Security

5.1.1.4 Payment Speed

5.1.1.5 Social Influence on Financial Decisions

5.1.1.6 Conclusion

5.2 Categorical data

5.2.1 Summary of key insights - Categorical data

5.2.1.1 Payment Frequency and Method Preferences

5.2.1.2 Cash Spend and Usage Distribution

5.2.1.3 Income Sources and Digital vs. Cash Payments

5.2.1.4 Spending Habits and Financial Control

5.2.1.5 Demographic Breakdown

5.2.1.6 Conclusion

6 PCA Creation

7 Clustering

8 Criterion validity (significant descriptors)

8.1 Frequency of Mobile Payment Usage per Month

8.2 Frequency of Cash Usage for Purchases Up to 10 EUR

9 Demographics

9.1 Demographics - significant :)

9.1.1 Frequency of Mobile Payment Usage per Month

9.1.2 Frequency of Cash Usage for Purchases Up to 10 EUR

9.2 Demographics - not significant :(

9.2.1 Frequency of Card Payment Usage per Month

9.2.2 Frequency of Other Payment (PayPal, Stripe, etc.) Usage per Month

9.2.3 Frequency of Cash Usage for Purchases 11-99 EUR

9.2.4 Frequency of Cash Usage for Purchases 100-1000 EUR

9.2.5 Age

9.2.6 Income

9.2.7 Employment status

9.2.8 Education

9.2.9 Response

9.2.10 Banks

9.2.11 Gender

10 Cluster description

10.1 Cluster 1: Security-Conscious Users

10.2 Cluster 2: Tech-Savvy Enthusiasts

10.3 Cluster 3: Pragmatic Minimalists

10.4 Cluster 4: Skeptical Traditionalists

11 Perception map

11.1 Interpretation of the PCA Biplot (Perception Map)

11.1.1 Axes Interpretation:

11.1.2 Key Observations:

11.1.3 Conclusion:

12 Hypothesesis

12.1 Majority of young people use cash at least once a month.

12.2 Cash is primarily used by young people (18–27) when digital payment methods are unavailable.

12.3 Most young people use cash for small payments (up to 10 EUR).

12.4 Young people who save, mainly save in cash.

12.5 Young people (18–27) mostly use the money in the same form they receive it.

12.6 The convenience of digital payments is a more important factor influencing adoption among young people (18–27) than security.

12.7 Young people (18–27) mostly prefer a society that predominantly uses digital payments but remain reluctant to completely eliminate cash.

12.8 Young people (18–27) are motivated to use digital payments when splitting bills with friends due to the simplicity of transferring exact amounts.

13 Final conclusions based on the analysis