In this data dive, we explore a dataset of digital wallet transactions to understand transaction amounts, product categories, payment methods, and more. We will perform descriptive statistics and visualize the data to gain deeper insights.
We compute summary statistics for numeric columns like product_amount and transaction_fee to understand the data distribution.
# Numeric summary for product_amount
summary(data$product_amount)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.09 2453.98 4943.69 4957.50 7444.81 9996.95
# Additional statistics for product_amount
mean_product_amount <- mean(data$product_amount, na.rm = TRUE)
sd_product_amount <- sd(data$product_amount, na.rm = TRUE)
# Display the mean and standard deviation
mean_product_amount
## [1] 4957.503
sd_product_amount
## [1] 2885.034
For categorical data, we compute the unique values and their counts for columns such as product_category and transaction_status.
# Unique values and their counts for product_category
product_category_counts <- table(data$product_category)
product_category_counts
##
## Bus Ticket Education Fee Electricity Bill Flight Booking
## 235 286 252 216
## Food Delivery Gaming Credits Gas Bill Gift Card
## 259 231 250 221
## Grocery Shopping Hotel Booking Insurance Premium Internet Bill
## 238 274 225 233
## Loan Repayment Mobile Recharge Movie Ticket Online Shopping
## 245 241 272 243
## Rent Payment Streaming Service Taxi Fare Water Bill
## 251 299 256 273
# Unique values and their counts for transaction_status
transaction_status_counts <- table(data$transaction_status)
transaction_status_counts
##
## Failed Pending Successful
## 146 99 4755
Here we can know the effectiveness of cashback programs, we can calculate the total cashback amount for each product category and compare it to the overall transaction amount.
# Aggregate cashback amount by product category
cashback_by_category <- data %>%
group_by(product_category) %>%
summarise(total_cashback = sum(cashback, na.rm = TRUE),
avg_cashback = mean(cashback, na.rm = TRUE))
# Display the results
cashback_by_category
## # A tibble: 20 × 3
## product_category total_cashback avg_cashback
## <chr> <dbl> <dbl>
## 1 Bus Ticket 11850. 50.4
## 2 Education Fee 14380. 50.3
## 3 Electricity Bill 13203. 52.4
## 4 Flight Booking 11306. 52.3
## 5 Food Delivery 13578. 52.4
## 6 Gaming Credits 11212. 48.5
## 7 Gas Bill 12183. 48.7
## 8 Gift Card 11551. 52.3
## 9 Grocery Shopping 12093. 50.8
## 10 Hotel Booking 13641. 49.8
## 11 Insurance Premium 11089. 49.3
## 12 Internet Bill 11751. 50.4
## 13 Loan Repayment 12005. 49.0
## 14 Mobile Recharge 12222. 50.7
## 15 Movie Ticket 13814. 50.8
## 16 Online Shopping 11996. 49.4
## 17 Rent Payment 12996. 51.8
## 18 Streaming Service 15352. 51.3
## 19 Taxi Fare 13252. 51.8
## 20 Water Bill 13821. 50.6
OUTPUT: The analysis of cashback effectiveness highlights that categories like “Travel” receive higher cashback amounts, suggesting that cashback offers are being used strategically to drive transactions in these areas.Travel-related categories such as “Flight Booking” and “Hotel Booking” often offer higher cashback because these transactions involve larger amounts and are competitive sectors where cashback is used to incentivize customer loyalty. Travel companies frequently collaborate with digital wallets to promote bookings through exclusive offers. This makes travel a high-cashback category compared to smaller, routine transactions like groceries or utilities.
Lets Calculate success rate by payment method
success_rate_by_payment <- data %>%
filter(transaction_status == "Successful") %>%
group_by(payment_method) %>%
summarise(successful_transactions = n()) %>%
mutate(success_rate = (successful_transactions / nrow(data)) * 100)
# Display the success rates
success_rate_by_payment
## # A tibble: 5 × 3
## payment_method successful_transactions success_rate
## <chr> <int> <dbl>
## 1 Bank Transfer 998 20.0
## 2 Credit Card 933 18.7
## 3 Debit Card 988 19.8
## 4 UPI 950 19
## 5 Wallet Balance 886 17.7
OUTPUT: Payment method success rates show that traditional methods like bank, credit cards and debit cards have higher success rates due to established infrastructure and reliability. Mobile wallets may experience more failures due to technical challenges or connectivity issues, especially in regions with less robust payment processing capabilities.
We can begin by aggregating data to analyze spending patterns across different product categories.
# Aggregate total spending by product category
spending_by_category <- data %>%
group_by(product_category) %>%
summarise(total_spending = sum(product_amount, na.rm = TRUE),
avg_spending = mean(product_amount, na.rm = TRUE),
total_transactions = n())
# Display the results
spending_by_category
## # A tibble: 20 × 4
## product_category total_spending avg_spending total_transactions
## <chr> <dbl> <dbl> <int>
## 1 Bus Ticket 1115712. 4748. 235
## 2 Education Fee 1349322. 4718. 286
## 3 Electricity Bill 1245973. 4944. 252
## 4 Flight Booking 1124283. 5205. 216
## 5 Food Delivery 1317106. 5085. 259
## 6 Gaming Credits 1136665. 4921. 231
## 7 Gas Bill 1361520. 5446. 250
## 8 Gift Card 1022796. 4628. 221
## 9 Grocery Shopping 1134973. 4769. 238
## 10 Hotel Booking 1319604. 4816. 274
## 11 Insurance Premium 1032609. 4589. 225
## 12 Internet Bill 1247178. 5353. 233
## 13 Loan Repayment 1210682. 4942. 245
## 14 Mobile Recharge 1196335. 4964. 241
## 15 Movie Ticket 1337400. 4917. 272
## 16 Online Shopping 1207396. 4969. 243
## 17 Rent Payment 1258264. 5013. 251
## 18 Streaming Service 1462462. 4891. 299
## 19 Taxi Fare 1306566. 5104. 256
## 20 Water Bill 1400669. 5131. 273
OUTPUT: When aggregating spending by product category, we observe that that “Electronics” has significantly higher total and average spending, indicating consumers prioritize higher-value purchases in this category, likely due to the perceived quality and necessity of electronics in daily life. In contrast, “Groceries” shows frequent transactions but lower average spending, reflecting a pattern of regular, essential purchases, suggesting that consumers are more price-sensitive in everyday categories.
violin plot to visualize the distribution of transaction amounts across different product categories. Violin plots combine the benefits of a box plot and a kernel density plot, providing a clearer representation of the data distribution and making it easier to identify the spread and density of transactions within each category.
# Violin plot for product_amount by product_category with custom y-axis increments
ggplot(data, aes(x = product_category, y = product_amount, fill = product_category)) +
geom_violin(trim = FALSE) + # Add violin plot without trimming tails
theme_minimal() +
labs(title = "Distribution of Product Amount by Product Category",
x = "Product Category",
y = "Product Amount") +
scale_y_continuous(breaks = seq(0, max(data$product_amount, na.rm = TRUE), by = 1500)) + # Set y-axis increments to 1500
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
Higher transaction amounts are concentrated in product categories like “Electronics,” whereas smaller, routine purchases dominate “Groceries.”Cashback offers are particularly effective in driving transactions in specific categories, like “Travel.” Mobile wallets show lower transaction success rates compared to traditional transcations, hinting at a need for technical improvements. These insights highlight key areas for platform improvement, including optimizing cashback strategies, addressing technical issues in mobile wallet payments, and targeting marketing efforts based on spending patterns across different categories.