Select a Binary Variable

We transform the payment_method variable into a binary format to predict the likelihood of using “UPI”.

# Convert transaction_status to binary: 1 if "success", 0 if "failure"
data <- data |>
  mutate(binary_status = if_else(payment_method == "UPI", 1, 0))

Logistic Regression Model

We’ll build a logistic regression model to predict binary_status based on explanatory variables such as product_amount, transaction_fee, and cashback.

# Fit logistic regression model
logit_model <- glm(binary_status ~ product_amount + transaction_fee + cashback, 
                   data = data, family = binomial(link='logit'))
summary(logit_model)
## 
## Call:
## glm(formula = binary_status ~ product_amount + transaction_fee + 
##     cashback, family = binomial(link = "logit"), data = data)
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -1.563e+00  1.136e-01 -13.756   <2e-16 ***
## product_amount  -8.279e-06  1.227e-05  -0.675   0.5000    
## transaction_fee  3.796e-03  2.436e-03   1.558   0.1191    
## cashback         2.334e-03  1.243e-03   1.878   0.0604 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 5001.3  on 4999  degrees of freedom
## Residual deviance: 4994.9  on 4996  degrees of freedom
## AIC: 5002.9
## 
## Number of Fisher Scoring iterations: 4

Interpretation of Logistic Regression Coefficients

Intercept (-1.38995): Represents the log odds of using UPI when all predictors are zero; a negative value suggests a very low probability of UPI usage under these conditions.

Product Amount (-0.02388): Indicates that for each unit increase in product_amount, the log odds of using UPI decrease by approximately 0.02388, but this effect is not statistically significant (p = 0.5000).

Transaction Fee (0.05517): For each unit increase in transaction_fee, the log odds of using UPI increase by about 0.05517; however, this coefficient is also not statistically significant (p = 0.1191).

Cashback (0.06658): Suggests that for each unit increase in cashback, the log odds of opting for UPI increase by approximately 0.06658, nearing statistical significance (p = 0.0604).

Confidence Interval for Coefficients

We can use the standard errors of coefficients to calculate confidence intervals, giving insights into the stability of our estimates.

# Calculate confidence interval for the product_amount coefficient
coef_estimate <- summary(logit_model)$coefficients["product_amount", "Estimate"]
coef_se <- summary(logit_model)$coefficients["product_amount", "Std. Error"]
conf_int <- coef_estimate + c(-1.96, 1.96) * coef_se
conf_int
## [1] -3.233528e-05  1.577759e-05

Interpretation of the Confidence Interval

C.I. = (-0.00302, 0.13618): This interval suggests that we are 95% confident that the true effect of cashback on the log odds of choosing UPI payment falls between approximately -0.00302 and 0.13618. The inclusion of zero within this interval indicates that there is a possibility that cashback does not have a significant effect on the likelihood of selecting UPI as a payment method.

Comparing Payment method and cashback

# Convert payment_method to binary variable
# Convert payment_method to binary variable
data <- data %>%
  mutate(payment_method_binary = if_else(payment_method == "UPI", "UPI", "Non-UPI"))
# Add predicted probabilities to data
data <- data %>%
  mutate(predicted_prob = predict(logit_model, type = "response"))

# Plot predicted probabilities against cashback, colored by payment_method_binary
ggplot(data, aes(x = cashback, y = predicted_prob, color = payment_method_binary)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "loess", se = FALSE) +
  labs(title = "Predicted Probability of UPI Payment by Cashback",
       x = "Cashback",
       y = "Predicted Probability of UPI",
       color = "Payment Method") +
  scale_color_manual(values = c("Non-UPI" = "red", "UPI" = "black"), 
                     labels = c("Non-UPI", "UPI")) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Insights: Predicted Probability of UPI Payment by Cashback Trend Analysis: The graph indicates a positive correlation between the cashback amount and the predicted probability of using UPI (Unified Payments Interface) as a payment method. As the cashback increases, the likelihood of selecting UPI also increases. Probability Range: The predicted probability for UPI payments ranges approximately from 0.18 to 0.24. This suggests that even at lower cashback amounts, there is a significant chance of opting for UPI payments, which increases with higher cashback. Payment Method Distinction: The distinction between UPI (blue) and non-UPI (black) payments shows that users tend to prefer UPI more significantly as cashback increases. The black line (for non-UPI) is positioned lower than the blue line, reinforcing the notion that higher cashback influences UPI usage positively. ### Box Plot

# Create the box plot
ggplot(data, aes(x = payment_method_binary, y = cashback, fill = payment_method_binary)) +
  geom_boxplot() +
  labs(title = "Box Plot of Cashback by Payment Method",
       x = "Payment Method",
       y = "Cashback") +
  scale_fill_manual(values = c("Non-UPI" = "gray", "UPI" = "black")) +
  theme_minimal()

Insights: Cashback Distribution: The box plot shows that the cashback distribution for UPI payments (black box) is generally higher than that for non-UPI payments (grey box). The median cashback for UPI is likely above the median for non-UPI payments, indicating that users using UPI receive higher cashback rewards on average. Spread of Data: The interquartile range (IQR) for UPI seems larger, suggesting that there is greater variability in the cashback received when using UPI compared to non-UPI. This may imply that UPI transactions can lead to both significantly high and low cashback outcomes. Outliers: If present, any points outside the whiskers would indicate outliers, suggesting that a few users are receiving much higher or lower cashback amounts in the UPI category, potentially warranting further investigation.

Overall Insights: UPI as a Preferred Payment Method: The analyses indicate that higher cashback amounts can significantly drive the use of UPI for payments. This could be leveraged by businesses to promote UPI transactions through cashback offers. Targeted Promotions: Understanding the relationship between cashback and payment method can help in formulating targeted promotions. Increasing cashback incentives could encourage more users to adopt UPI for their transactions. Further Analysis: It might be beneficial to conduct additional analysis to explore the demographic or behavioral characteristics of users opting for UPI versus non-UPI payments, as well as to investigate the reasons behind the variability in cashback amounts.