Week7

Creating two hypotheses based on different aspects of our data.

Hypothesis 1: Transaction Fee for Successful vs Failed Transactions

Test Type: Independent two-sample t-test (since we are comparing means between two groups: successful and failed transactions). Alpha Level: Set to 0.05 (5% significance level). Type 2 Error / Power Level: Power of 0.8 (a common choice). Null Hypothesis (H0): There is no significant difference in the average transaction fee between successful and failed transactions. Alternative Hypothesis (H1): The average transaction fee for successful transactions is lower than that for failed transactions.

# Install the pwr package
install.packages("pwr")

## Installing package into 'C:/Users/My PC/AppData/Local/R/win-library/4.4'
## (as 'lib' is unspecified)

## package 'pwr' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\My PC\AppData\Local\Temp\Rtmp4KqKfZ\downloaded_packages

library(pwr)

# Filter the dataset for successful and failed transactions
success_data <- data %>% filter(transaction_status == "Successful")
failed_data <- data %>% filter(transaction_status == "Failed")

# Check the sample sizes
n_success <- nrow(success_data)
n_failed <- nrow(failed_data)

# Print sample sizes
cat(sprintf("Number of Successful Transactions: %d\n", n_success))

## Number of Successful Transactions: 4755

cat(sprintf("Number of Failed Transactions: %d\n", n_failed))

## Number of Failed Transactions: 146

# Effect size and power analysis
effect_size <- 0.5  # moderate effect size
power_test <- pwr.t.test(d = effect_size, power = 0.8, sig.level = 0.05, type = "two.sample")

# Print required sample size
cat(sprintf("Required Sample Size per Group: %d\n", ceiling(power_test$n)))

## Required Sample Size per Group: 64

# Perform t-test if sample size is sufficient
if (n_success >= power_test$n & n_failed >= power_test$n) {
  t_test_result <- t.test(success_data$transaction_fee, failed_data$transaction_fee, 
                          alternative = "less", var.equal = FALSE)
  
  # Print t-test results
  cat("T-Test Results:\n")
  cat(sprintf("t = %.4f\n", t_test_result$statistic))
  cat(sprintf("Degrees of Freedom = %.2f\n", t_test_result$parameter))
  cat(sprintf("p-value = %.4f\n", t_test_result$p.value))
  
  # Interpret the results
  if (t_test_result$p.value < 0.05) {
    cat("Conclusion: We reject the null hypothesis. There is a significant difference in transaction fees between successful and failed transactions.\n")
  } else {
    cat("Conclusion: We fail to reject the null hypothesis. There is no significant difference in transaction fees.\n")
  }
} else {
  cat("Conclusion: Not enough data for hypothesis testing.\n")
}

## T-Test Results:
## t = 0.6124
## Degrees of Freedom = 153.65
## p-value = 0.7294
## Conclusion: We fail to reject the null hypothesis. There is no significant difference in transaction fees.

Interpretation of Results: P-value = 0.7294: This p-value is much higher than the commonly used significance level (α = 0.05).

We fail to reject the null hypothesis since the p-value is greater than 0.05. This means there is no statistically significant difference in transaction fees between successful and failed transactions based on the data available.Therefore, transaction fees do not appear to be associated with whether a transaction succeeds or fails.

Hypothesis 2: Does Cashback Impact Final Amount Paid?

We will use Pearson’s correlation to test the relationship between cashback and final amount since both are continuous variables. The p-value will tell us if the observed correlation is significant. Null Hypothesis (H0): Cashback received does not significantly affect the final amount paid by the customer. Alternative Hypothesis (H1): Higher cashback results in a significantly lower final amount paid by the customer.

# Calculate final_amount by adjusting cashback and transaction_fee
data <- data %>%
  mutate(final_amount = product_amount - cashback + transaction_fee)
# Perform Pearson correlation test between cashback and final amount
correlation_test <- cor.test(data$cashback, data$final_amount, method = "pearson")

# Display only the key results (correlation coefficient and p-value) in a clean format
cat("Correlation between Cashback and Final Amount:\n")

## Correlation between Cashback and Final Amount:

cat("Correlation Coefficient: ", round(correlation_test$estimate, 5), "\n")

## Correlation Coefficient:  -0.00438

cat("P-value: ", round(correlation_test$p.value, 5), "\n")

## P-value:  0.75674

# Interpretation based on p-value
if (!is.na(correlation_test$p.value)) {
  if (correlation_test$p.value < 0.05) {
    cat("Conclusion: We reject the null hypothesis. Cashback has a significant impact on the final amount paid.\n")
  } else {
    cat("Conclusion: We fail to reject the null hypothesis. Cashback does not significantly affect the final amount paid.\n")
  }
} else {
  cat("Unable to calculate the correlation. Check the dataset for missing or incorrect values.\n")
}

## Conclusion: We fail to reject the null hypothesis. Cashback does not significantly affect the final amount paid.

Interpretation of Results:

P-value: 0.75674 (greater than 0.05). Since the p-value is much higher than the commonly used significance level of 0.05, we fail to reject the null hypothesis. This means there is not enough evidence to conclude that cashback has a significant impact on the final amount paid. Essentially, cashback and final amount seem to be weakly or not related based on this data.

Reasons:

1.The lack of a strong relationship between cashback and the final amount paid makes sense intuitively, as cashback is typically a small proportion of the total transaction and may not have a major impact on the final amount paid.

2.In most cases, cashback rewards are intended to offer an incentive but not drastically alter the final amount paid by the customer. They function more as a small rebate rather than a substantial discount. This aligns with the finding that cashback does not significantly affect the overall amount spent.

Visualization for Hypothesis 1:

# Load necessary library
library(ggplot2)

# Combine success and failed data for visualization
transaction_data <- rbind(
  mutate(success_data, transaction_status = "Successful"),
  mutate(failed_data, transaction_status = "Failed")
)

# Create boxplot for transaction fees
ggplot(transaction_data, aes(x = transaction_status, y = transaction_fee, fill = transaction_status)) +
  geom_boxplot() +
  labs(title = "Transaction Fee by Transaction Status",
       x = "Transaction Status",
       y = "Transaction Fee") +
  theme_minimal() +
  scale_fill_manual(values = c("Successful" = "lightgreen", "Failed" = "salmon")) +
  theme(legend.position = "none")

The median transaction fees for both successful and failed transactions are close, this aligns with the t-test result that there is no significant difference in transaction fees.

Visualization for Hypothesis 2:

data <- data %>%
  mutate(cashback_category = case_when(
    cashback == 0 ~ "None",
    cashback > 0 & cashback <= 10 ~ "Low",
    cashback > 10 & cashback <= 50 ~ "Medium",
    cashback > 50 ~ "High"
  ))

# Calculate average final amount for each cashback category
avg_final_amount <- data %>%
  group_by(cashback_category) %>%
  summarise(mean_final_amount = mean(final_amount, na.rm = TRUE))

# Bar plot for average final amount by cashback category
ggplot(avg_final_amount, aes(x = cashback_category, y = mean_final_amount, fill = cashback_category)) +
  geom_bar(stat = "identity") +
  labs(title = "Average Final Amount by Cashback Category",
       x = "Cashback Categories",
       y = "Average Final Amount Paid") +
  theme_minimal() +
  scale_fill_brewer(palette = "Set3") +
  theme(legend.position = "none")

The bars are similar in height, it visually reinforces that there is no significant difference in the final amount paid across cashback categories, aligning with the statistical test results.

Insights and Significance from Hypotheses 1

Insights: The results indicate that transaction fees are not a determining factor for whether a transaction is successful or failed. This suggests that other factors, such as network issues or technical glitches, may play a more important role in transaction success rates. Since the p-value is high, the variation in fees does not meaningfully impact the outcome, which could imply that the platform’s fee structure is fair and consistent across different transaction outcomes.

Significance: This finding is important because it shows that transaction fees are unlikely to contribute to transaction failures, which may help digital wallet platforms focus on optimizing other aspects of their system, such as improving technical reliability.

Insights and Significance from Hypotheses 2

Insights: The correlation between cashback and the final amount paid is very close to zero, meaning that cashback is not a strong predictor of how much a customer ends up paying. This suggests that users who receive cashback don’t seem to experience a noticeable change in the final amount they pay after accounting for other transaction factors (such as product cost and transaction fees). Cashback may be perceived more as a reward or incentive by users, rather than something that significantly alters the amount they pay at the time of purchase. This could indicate that cashback promotions are more about driving user engagement or loyalty rather than reducing immediate transaction costs.

Significance: This insight is important for businesses because it suggests that while cashback is often used as an incentive, it doesn’t drastically affect users’ spending behavior in terms of reducing the final amount they pay. Therefore, cashback programs might serve better as marketing tools rather than cost-saving features.