Dataset Overview: The Digital Wallet Transactions Dataset simulates transactions from a digital wallet platform similar to services like PayTm (India) or Khalti (Nepal). It contains 5,000 synthetic records that represent various financial transactions across multiple product categories. The dataset can be used to analyze spending behaviors, transaction patterns, and payment methods.

Link to Dataset:click Here

Documentation: The dataset provides realistic features such as product names, merchant details, transaction fees, and cashback, simulating modern digital wallet behavior.

Main Goal: The project aims to investigate the spending patterns, success rates of transactions, and usage trends of different payment methods on digital wallet platforms. Additionally, a potential area of focus is the effectiveness of cashback on Final amount and loyalty programs on customer engagement and transaction volumes. The dataset can also be used to explore seasonal or device-related trends.

Interesting Aspects of the Data for Further Investigation:

Transaction Status Across Payment Methods:

# Bar plot for Transaction Status across Payment Methods
ggplot(data, aes(x = payment_method, fill = transaction_status)) +
  geom_bar(position = "dodge") +
  labs(title = "Transaction Status Across Payment Methods",
       x = "Payment Method",
       y = "Count of Transactions",
       fill = "Transaction Status") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

This bar plot shows the distribution of different transaction statuses (e.g., Success, Failed, Pending) across the various payment methods (e.g., Credit Card, Debit Card, etc.). It helps identify whether certain payment methods are more prone to failed or pending transactions.

Why it’s interesting: Insights from this can be useful for improving transaction reliability and user experience by identifying high-failure methods.Here Bank Transfers shows more success rates then others.

Cashback vs. Loyalty Points:

# Scatter plot for Cashback vs. Loyalty Points
ggplot(data, aes(x = cashback, y = loyalty_points)) +
  geom_smooth(method = "lm", se = FALSE, color = "green") +  # Adding a trend line
  labs(title = "Cashback vs. Loyalty Points",
       x = "Cashback",
       y = "Loyalty Points") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

This scatter plot visualizes the relationship between cashback and loyalty points. The trend line added helps to see if there’s a linear relationship between the amount of cashback and loyalty points earned during the transactions.

Why it’s interesting: This could highlight user behaviors in response to incentives and help develop targeted marketing strategies or personalized offers to boost engagement.

To-Do List:

Exploratory Data Analysis (EDA):

Transaction Success Prediction:

Cashback and Loyalty Program Analysis:

User Segmentation:

Initial Findings

Hypothesis 1: Transaction Fees Affect Transaction Status

success_data <- data %>% filter(transaction_status == "Successful")
failed_data <- data %>% filter(transaction_status == "Failed")
# Combine the filtered data for visualization
transaction_data <- rbind(
  mutate(success_data, transaction_status = "Successful"),
  mutate(failed_data, transaction_status = "Failed")
)

# Create boxplot for transaction fees by transaction status
ggplot(transaction_data, aes(x = transaction_status, y = transaction_fee, fill = transaction_status)) +
  geom_boxplot() +
  labs(title = "Transaction Fee by Transaction Status",
       x = "Transaction Status",
       y = "Transaction Fee") +
  theme_minimal() +
  scale_fill_manual(values = c("Successful" = "lightgreen", "Failed" = "salmon")) +
  theme(legend.position = "none")

The median transaction fees for both successful and failed transactions are close, this aligns with the t-test result that there is no significant difference in transaction fees.

Hypothesis 2: Does Cashback Impact Final Amount Paid?

data <- data %>%
  mutate(final_amount_paid = product_amount - cashback + transaction_fee)

data <- data %>%
  mutate(cashback_category = ifelse(cashback > 0, "Cashback Received", "No Cashback"))

# Summarize the average final amount paid and average cashback by cashback category
avg_final_amount_cashback <- data %>%
  group_by(cashback_category) %>%
  summarize(
    avg_final_amount = mean(final_amount_paid, na.rm = TRUE),
    avg_cashback = mean(cashback, na.rm = TRUE),
    count = n()  # Count the number of transactions per category
  )

# Print the summary table
print(avg_final_amount_cashback)
## # A tibble: 2 × 4
##   cashback_category avg_final_amount avg_cashback count
##   <chr>                        <dbl>        <dbl> <int>
## 1 Cashback Received            4932.         50.7  4999
## 2 No Cashback                  4911.          0       1
# Visualize the effect of cashback on the average final amount paid
ggplot(avg_final_amount_cashback, aes(x = cashback_category, y = avg_final_amount, fill = cashback_category)) +
  geom_bar(stat = "identity", color = "black") +
  labs(title = "Average Final Amount Paid by Cashback Category",
       x = "Cashback Category",
       y = "Average Final Amount Paid") +
  theme_minimal() +
  scale_fill_manual(values = c("No Cashback" = "lightblue", "Cashback Received" = "lightgreen")) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

The bar heights for both “Cashback Received” and “No Cashback” are the same, it suggests that cashback does not significantly affect the final amount paid. This could mean that the final transaction amounts are similar across both categories, indicating that cashback incentives may not strongly impact customer spending behavior or transaction sizes in this dataset.

BY product Category:

# Summarize cashback by product category
cashback_by_category <- data %>%
  group_by(product_category) %>%
  summarize(total_cashback = sum(cashback, na.rm = TRUE))

# Bar plot showing total cashback for each product category
ggplot(cashback_by_category, aes(x = reorder(product_category, -total_cashback), y = total_cashback)) +
  geom_bar(stat = "identity", fill = "lightblue") +
  labs(title = "Total Cashback by Product Category",
       x = "Product Category",
       y = "Total Cashback") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotating x-axis labels for readability