Dataset Overview: The Digital Wallet Transactions Dataset simulates transactions from a digital wallet platform similar to services like PayTm (India) or Khalti (Nepal). It contains 5,000 synthetic records that represent various financial transactions across multiple product categories. The dataset can be used to analyze spending behaviors, transaction patterns, and payment methods.
Link to Dataset:click Here
Documentation: The dataset provides realistic features such as product names, merchant details, transaction fees, and cashback, simulating modern digital wallet behavior.
Main Goal: The project aims to investigate the spending patterns, success rates of transactions, and usage trends of different payment methods on digital wallet platforms. Additionally, a potential area of focus is the effectiveness of cashback on Final amount and loyalty programs on customer engagement and transaction volumes. The dataset can also be used to explore seasonal or device-related trends.
Transaction Status Across Payment Methods:
# Bar plot for Transaction Status across Payment Methods
ggplot(data, aes(x = payment_method, fill = transaction_status)) +
geom_bar(position = "dodge") +
labs(title = "Transaction Status Across Payment Methods",
x = "Payment Method",
y = "Count of Transactions",
fill = "Transaction Status") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This bar plot shows the distribution of different transaction statuses (e.g., Success, Failed, Pending) across the various payment methods (e.g., Credit Card, Debit Card, etc.). It helps identify whether certain payment methods are more prone to failed or pending transactions.
Why it’s interesting: Insights from this can be useful for improving transaction reliability and user experience by identifying high-failure methods.Here Bank Transfers shows more success rates then others.
Cashback vs. Loyalty Points:
# Scatter plot for Cashback vs. Loyalty Points
ggplot(data, aes(x = cashback, y = loyalty_points)) +
geom_smooth(method = "lm", se = FALSE, color = "green") + # Adding a trend line
labs(title = "Cashback vs. Loyalty Points",
x = "Cashback",
y = "Loyalty Points") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
This scatter plot visualizes the relationship between cashback and loyalty points. The trend line added helps to see if there’s a linear relationship between the amount of cashback and loyalty points earned during the transactions.
Why it’s interesting: This could highlight user behaviors in response to incentives and help develop targeted marketing strategies or personalized offers to boost engagement.
To-Do List:
Exploratory Data Analysis (EDA):
Visualize transaction statuses (Successful, Failure, Pending) across payment methods.
Analyze spending patterns across product categories and merchants.
Investigate the distribution of cashback and loyalty points across various product categories.
Transaction Success Prediction:
Select relevant features (e.g., payment method, device type, transaction amount, location) for prediction.
Train a logistic regression or decision tree model to predict transaction success.
Evaluate model performance (accuracy, precision, recall) and analyze feature importance.
Cashback and Loyalty Program Analysis:
User Segmentation:
Hypothesis 1: Transaction Fees Affect Transaction Status
success_data <- data %>% filter(transaction_status == "Successful")
failed_data <- data %>% filter(transaction_status == "Failed")
# Combine the filtered data for visualization
transaction_data <- rbind(
mutate(success_data, transaction_status = "Successful"),
mutate(failed_data, transaction_status = "Failed")
)
# Create boxplot for transaction fees by transaction status
ggplot(transaction_data, aes(x = transaction_status, y = transaction_fee, fill = transaction_status)) +
geom_boxplot() +
labs(title = "Transaction Fee by Transaction Status",
x = "Transaction Status",
y = "Transaction Fee") +
theme_minimal() +
scale_fill_manual(values = c("Successful" = "lightgreen", "Failed" = "salmon")) +
theme(legend.position = "none")
The median transaction fees for both successful and failed transactions are close, this aligns with the t-test result that there is no significant difference in transaction fees.
data <- data %>%
mutate(final_amount_paid = product_amount - cashback + transaction_fee)
data <- data %>%
mutate(cashback_category = ifelse(cashback > 0, "Cashback Received", "No Cashback"))
# Summarize the average final amount paid and average cashback by cashback category
avg_final_amount_cashback <- data %>%
group_by(cashback_category) %>%
summarize(
avg_final_amount = mean(final_amount_paid, na.rm = TRUE),
avg_cashback = mean(cashback, na.rm = TRUE),
count = n() # Count the number of transactions per category
)
# Print the summary table
print(avg_final_amount_cashback)
## # A tibble: 2 × 4
## cashback_category avg_final_amount avg_cashback count
## <chr> <dbl> <dbl> <int>
## 1 Cashback Received 4932. 50.7 4999
## 2 No Cashback 4911. 0 1
# Visualize the effect of cashback on the average final amount paid
ggplot(avg_final_amount_cashback, aes(x = cashback_category, y = avg_final_amount, fill = cashback_category)) +
geom_bar(stat = "identity", color = "black") +
labs(title = "Average Final Amount Paid by Cashback Category",
x = "Cashback Category",
y = "Average Final Amount Paid") +
theme_minimal() +
scale_fill_manual(values = c("No Cashback" = "lightblue", "Cashback Received" = "lightgreen")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
The bar heights for both “Cashback Received” and “No Cashback” are the same, it suggests that cashback does not significantly affect the final amount paid. This could mean that the final transaction amounts are similar across both categories, indicating that cashback incentives may not strongly impact customer spending behavior or transaction sizes in this dataset.
BY product Category:
# Summarize cashback by product category
cashback_by_category <- data %>%
group_by(product_category) %>%
summarize(total_cashback = sum(cashback, na.rm = TRUE))
# Bar plot showing total cashback for each product category
ggplot(cashback_by_category, aes(x = reorder(product_category, -total_cashback), y = total_cashback)) +
geom_bar(stat = "identity", fill = "lightblue") +
labs(title = "Total Cashback by Product Category",
x = "Product Category",
y = "Total Cashback") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotating x-axis labels for readability