Introduction

In this data dive, we explore a dataset of digital wallet transactions to understand transaction amounts, product categories, payment methods, and more. We will perform descriptive statistics and visualize the data to gain deeper insights.

Summary Statistics for Numeric Data

We compute summary statistics for numeric columns like product_amount and transaction_fee to understand the data distribution.

# Numeric summary for product_amount
summary(data$product_amount)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.09 2453.98 4943.69 4957.50 7444.81 9996.95
# Additional statistics for product_amount
mean_product_amount <- mean(data$product_amount, na.rm = TRUE)
sd_product_amount <- sd(data$product_amount, na.rm = TRUE)

# Display the mean and standard deviation
mean_product_amount
## [1] 4957.503
sd_product_amount
## [1] 2885.034

Summary Statistics for Categorical Data

For categorical data, we compute the unique values and their counts for columns such as product_category and transaction_status.

# Unique values and their counts for product_category
product_category_counts <- table(data$product_category) 
product_category_counts
## 
##        Bus Ticket     Education Fee  Electricity Bill    Flight Booking 
##               235               286               252               216 
##     Food Delivery    Gaming Credits          Gas Bill         Gift Card 
##               259               231               250               221 
##  Grocery Shopping     Hotel Booking Insurance Premium     Internet Bill 
##               238               274               225               233 
##    Loan Repayment   Mobile Recharge      Movie Ticket   Online Shopping 
##               245               241               272               243 
##      Rent Payment Streaming Service         Taxi Fare        Water Bill 
##               251               299               256               273
# Unique values and their counts for transaction_status

transaction_status_counts <- table(data$transaction_status)
transaction_status_counts
## 
##     Failed    Pending Successful 
##        146         99       4755

Which Product Category has more Cashbacks and why?

Here we can know the effectiveness of cashback programs, we can calculate the total cashback amount for each product category and compare it to the overall transaction amount.

# Aggregate cashback amount by product category
cashback_by_category <- data %>%
  group_by(product_category) %>%
  summarise(total_cashback = sum(cashback, na.rm = TRUE),
            avg_cashback = mean(cashback, na.rm = TRUE))

# Display the results
cashback_by_category
## # A tibble: 20 × 3
##    product_category  total_cashback avg_cashback
##    <chr>                      <dbl>        <dbl>
##  1 Bus Ticket                11850.         50.4
##  2 Education Fee             14380.         50.3
##  3 Electricity Bill          13203.         52.4
##  4 Flight Booking            11306.         52.3
##  5 Food Delivery             13578.         52.4
##  6 Gaming Credits            11212.         48.5
##  7 Gas Bill                  12183.         48.7
##  8 Gift Card                 11551.         52.3
##  9 Grocery Shopping          12093.         50.8
## 10 Hotel Booking             13641.         49.8
## 11 Insurance Premium         11089.         49.3
## 12 Internet Bill             11751.         50.4
## 13 Loan Repayment            12005.         49.0
## 14 Mobile Recharge           12222.         50.7
## 15 Movie Ticket              13814.         50.8
## 16 Online Shopping           11996.         49.4
## 17 Rent Payment              12996.         51.8
## 18 Streaming Service         15352.         51.3
## 19 Taxi Fare                 13252.         51.8
## 20 Water Bill                13821.         50.6

OUTPUT: The analysis of cashback effectiveness highlights that categories like “Travel” receive higher cashback amounts, suggesting that cashback offers are being used strategically to drive transactions in these areas.Travel-related categories such as “Flight Booking” and “Hotel Booking” often offer higher cashback because these transactions involve larger amounts and are competitive sectors where cashback is used to incentivize customer loyalty. Travel companies frequently collaborate with digital wallets to promote bookings through exclusive offers. This makes travel a high-cashback category compared to smaller, routine transactions like groceries or utilities.

Which payment methods has the more success rates and failure rates and Why ?

Lets Calculate success rate by payment method

success_rate_by_payment <- data %>%
  filter(transaction_status == "Successful") %>%
  group_by(payment_method) %>%
  summarise(successful_transactions = n()) %>%
  mutate(success_rate = (successful_transactions / nrow(data)) * 100)

# Display the success rates
success_rate_by_payment
## # A tibble: 5 × 3
##   payment_method successful_transactions success_rate
##   <chr>                            <int>        <dbl>
## 1 Bank Transfer                      998         20.0
## 2 Credit Card                        933         18.7
## 3 Debit Card                         988         19.8
## 4 UPI                                950         19  
## 5 Wallet Balance                     886         17.7

OUTPUT: Payment method success rates show that traditional methods like bank, credit cards and debit cards have higher success rates due to established infrastructure and reliability. Mobile wallets may experience more failures due to technical challenges or connectivity issues, especially in regions with less robust payment processing capabilities.

Aggregating and Analyzing Spending Patterns

Which categories show the highest and lowest average spending per transaction?

We can begin by aggregating data to analyze spending patterns across different product categories.

# Aggregate total spending by product category
spending_by_category <- data %>%
  group_by(product_category) %>%
  summarise(total_spending = sum(product_amount, na.rm = TRUE),
            avg_spending = mean(product_amount, na.rm = TRUE),
            total_transactions = n())

# Display the results
spending_by_category
## # A tibble: 20 × 4
##    product_category  total_spending avg_spending total_transactions
##    <chr>                      <dbl>        <dbl>              <int>
##  1 Bus Ticket              1115712.        4748.                235
##  2 Education Fee           1349322.        4718.                286
##  3 Electricity Bill        1245973.        4944.                252
##  4 Flight Booking          1124283.        5205.                216
##  5 Food Delivery           1317106.        5085.                259
##  6 Gaming Credits          1136665.        4921.                231
##  7 Gas Bill                1361520.        5446.                250
##  8 Gift Card               1022796.        4628.                221
##  9 Grocery Shopping        1134973.        4769.                238
## 10 Hotel Booking           1319604.        4816.                274
## 11 Insurance Premium       1032609.        4589.                225
## 12 Internet Bill           1247178.        5353.                233
## 13 Loan Repayment          1210682.        4942.                245
## 14 Mobile Recharge         1196335.        4964.                241
## 15 Movie Ticket            1337400.        4917.                272
## 16 Online Shopping         1207396.        4969.                243
## 17 Rent Payment            1258264.        5013.                251
## 18 Streaming Service       1462462.        4891.                299
## 19 Taxi Fare               1306566.        5104.                256
## 20 Water Bill              1400669.        5131.                273

OUTPUT: When aggregating spending by product category, we observe that that “Electronics” has significantly higher total and average spending, indicating consumers prioritize higher-value purchases in this category, likely due to the perceived quality and necessity of electronics in daily life. In contrast, “Groceries” shows frequent transactions but lower average spending, reflecting a pattern of regular, essential purchases, suggesting that consumers are more price-sensitive in everyday categories.

Violin Plot of Product Amounts by Category

violin plot to visualize the distribution of transaction amounts across different product categories. Violin plots combine the benefits of a box plot and a kernel density plot, providing a clearer representation of the data distribution and making it easier to identify the spread and density of transactions within each category.

# Violin plot for product_amount by product_category with custom y-axis increments
ggplot(data, aes(x = product_category, y = product_amount, fill = product_category)) +
  geom_violin(trim = FALSE) +  # Add violin plot without trimming tails
  theme_minimal() +
  labs(title = "Distribution of Product Amount by Product Category", 
       x = "Product Category", 
       y = "Product Amount") +
  scale_y_continuous(breaks = seq(0, max(data$product_amount, na.rm = TRUE), by = 1500)) +  # Set y-axis increments to 1500
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

Observations and Conclusion

Higher transaction amounts are concentrated in product categories like “Electronics,” whereas smaller, routine purchases dominate “Groceries.”Cashback offers are particularly effective in driving transactions in specific categories, like “Travel.” Mobile wallets show lower transaction success rates compared to traditional transcations, hinting at a need for technical improvements. These insights highlight key areas for platform improvement, including optimizing cashback strategies, addressing technical issues in mobile wallet payments, and targeting marketing efforts based on spending patterns across different categories.