Creating Pair 1

The response variable final_amount represents the total amount the customer paid after applying cashback and transaction fees. It is calculated using the formula: \[finalamount=product amount−cashback+transaction fee\]

where product_amount is the original price before any modifications, cashback is the refund provided to the customer, and transaction_fee represents any additional fees applied during the transaction. This variable gives a comprehensive view of the net payment made after adjusting for these factors.

# Creating final_amount (product_amount after cashback and fees)
data <- data %>%
  mutate(final_amount = product_amount - cashback + transaction_fee)

Creating Pair 2

The total_rewards variable represents the total benefits received by the customer from the transaction, which includes both the cashback (the amount refunded to the customer) and the loyalty points (rewarded points). This column gives a complete view of the total rewards provided to the customer as part of the transaction incentives.

# Creating total_rewards (combining cashback and loyalty points)
data <- data %>%
  mutate(total_rewards = cashback + loyalty_points)

Plotting the Relationship Between Variables

Plot for Pair 1

# Scatter plot for final_amount vs product_amount
ggplot(data, aes(x = product_amount, y = final_amount)) +
  geom_smooth(method = "lm", se = FALSE, color = "red") + # Adding a trend line
  labs(title = "Final Amount vs. Product Amount",
       x = "Product Amount",
       y = "Final Amount (after cashback and fees)") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

The scatter plot shows a nearly perfect linear trend, with points closely aligned along a straight upward line, confirming that higher product amounts consistently lead to proportionally higher final amounts.

Boxplot

library(ggplot2)

# Create binned categories for product_amount
data$product_amount_binned <- cut(data$product_amount, breaks = 5)  # Adjust number of bins as needed

# Create the box plot
ggplot(data, aes(x = product_amount_binned, y = final_amount)) +
  geom_boxplot(fill = "lightgreen", color = "black") +
  labs(title = "Final Amount vs. Product Amount (Binned)",
       x = "Product Amount (Binned Categories)",
       y = "Final Amount (after cashback and fees)") +
  theme_minimal()

There is no any Outliers, if any, would be minimal, reflecting the fact that there is very little deviation from the expected relationship between these two variables. This close alignment is what we expect when there is such a strong, linear connection between product sales and earnings.

Plot for Pair 2

library(ggplot2)

ggplot(data, aes(x = loyalty_points, y = total_rewards)) +
  
  geom_smooth(method = "lm", se = FALSE, color = "blue") +  # Adding a linear trend line
  labs(title = "Total Rewards vs. Loyalty Points",
       x = "Loyalty Points",
       y = "Total Rewards") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

The scatter plot reveals a strong upward trend, with points clustering tightly around a linear path, indicating that as loyalty points increase, total rewards rise almost predictably and consistently.

Boxplot

library(ggplot2)

# Create binned categories for loyalty_points
data$loyalty_points_binned <- cut(data$loyalty_points, breaks = 5)  # Adjust the number of breaks as needed

# Create the box plot
ggplot(data, aes(x = loyalty_points_binned, y = total_rewards)) +
  geom_boxplot(fill = "lightblue", color = "black") +
  labs(title = "Total Rewards vs. Loyalty Points (Binned)",
       x = "Loyalty Points (Binned Categories)",
       y = "Total Rewards") +
  theme_minimal()

The boxplot has no dots outside means there is no outliers.The lack of significant outliers or variability in the boxes supports the idea that customers who earn more points are consistently rewarded with higher total rewards.

Calculating Correlation Coefficient for Each Pair

Correlation for Pair 1

We will calculate the correlation coefficient to quantify the strength of the linear relationship between final_amount and product_amount. This value helps us understand how closely related the two variables are.

cor(data$product_amount, data$final_amount, use = "complete.obs")
## [1] 0.9999383

The correlation coefficient of 0.9999383 is incredibly high, indicating a very strong positive correlation between product_amount and final_amount. In simpler terms, this means that when the amount of product sold goes up, the total amount earned also tends to go up almost in perfect sync. Imagine if every extra unit sold brought in a proportional increase in revenue; that’s what this number suggests.

Consider the scatter plot above ,you’ll probably see a clear upward trend, where each increase in product amount corresponds to a noticeable increase in final amounts. This strong connection makes a lot of sense: it indicates that the business model is working effectively, where selling more products directly translates to higher earnings. So, if you’re keeping an eye on product sales, you can reasonably expect your revenue to climb right along with it.There is no any Outliers, if any, would be minimal, reflecting the fact that there is very little deviation from the expected relationship between these two variables. This close alignment is what we expect when there is such a strong, linear connection between product sales and earnings.

Correlation for Pair 2

The correlation coefficient between loyalty_points and total_rewards is:

cor(data$loyalty_points, data$total_rewards, use = "complete.obs")
## [1] 0.9951798

With a correlation coefficient of 0.9951798, we see another very strong positive correlation, this time between loyalty_points and total_rewards. This means that as customers accumulate more loyalty points, they generally earn more rewards in return. It’s almost like a reward system that’s designed to ensure that the more engaged you are with the program, the more you get back.

Now Consider the Scatter plot 2, we will likely notice a distinct upward trend: higher loyalty points usually mean higher total rewards. This is a reassuring sign that the loyalty program is effective; it rewards customers in a way that feels fair and consistent. So, if you’re a loyal customer racking up points, you can feel confident that those points are leading to valuable rewards. Overall, this strong correlation aligns perfectly with what you’d expect from a well-structured loyalty program.The lack of significant outliers or variability in the boxes supports the idea that customers who earn more points are consistently rewarded with higher total rewards.

Building Confidence Intervals for the Response Variables

Confidence Interval for finalamount

library(ggplot2)

# Calculate mean and confidence interval for final_amount
mean_final_amount <- mean(data$final_amount, na.rm = TRUE)
sd_final_amount <- sd(data$final_amount, na.rm = TRUE)
n_final_amount <- length(data$final_amount[!is.na(data$final_amount)])

# Calculate the confidence interval
Z <- 1.96
CI_final_amount <- mean_final_amount + c(-1, 1) * Z * (sd_final_amount / sqrt(n_final_amount))

# Create a data frame for plotting mean and CI
final_amount_ci <- data.frame(
  mean = mean_final_amount,
  lower = CI_final_amount[1],
  upper = CI_final_amount[2]
)
# Print the CI values
cat("Mean Final Amount:", round(mean_final_amount, 2), "\n")
## Mean Final Amount: 4932.03
cat("95% Confidence Interval: [", round(CI_final_amount[1], 2), ", ", round(CI_final_amount[2], 2), "]\n")
## 95% Confidence Interval: [ 4852.06 ,  5012 ]
# Violin plot
ggplot(data, aes(x = "", y = final_amount)) +
  geom_violin(fill = "blue", alpha = 0.4) +
  geom_point(data = final_amount_ci, aes(y = mean), color = "black", size = 3) + # Correctly reference mean
  geom_errorbar(data = final_amount_ci, aes(y = mean, ymin = lower, ymax = upper), 
                width = 0.2, color = "red") + # Reference the error bars correctly
  labs(title = "Violin Plot with 95% Confidence Interval for Final Amount is",
       y = "Final Amount (after cashback and fees)",
       x = "") +
  theme_minimal()

The mean final amount is reported at 4932.03, with a 95% confidence interval of [4852.06, 5012.00]. This indicates that we can be 95% confident that the true mean final amount received by customers lies within this range. The relatively narrow confidence interval suggests a high degree of precision in the estimation of the mean, implying that the final amounts are consistently close to the mean value across the sampled population. Given this information, it can be concluded that the average final amount is substantial, reflecting a positive financial impact on customers after considering cashback and fees. This consistency also indicates that the business’s cashback policies effectively enhance customer satisfaction, leading to reliable returns on their purchases.

Confidence Interval for Total Rewards

# Calculate mean and confidence interval for total_rewards
mean_total_rewards <- mean(data$total_rewards, na.rm = TRUE)
sd_total_rewards <- sd(data$total_rewards, na.rm = TRUE)
n_total_rewards <- length(data$total_rewards[!is.na(data$total_rewards)])

# Calculate the confidence interval
CI_total_rewards <- mean_total_rewards + c(-1, 1) * Z * (sd_total_rewards / sqrt(n_total_rewards))

# Create a data frame for plotting mean and CI
total_rewards_ci <- data.frame(
  mean = mean_total_rewards,
  lower = CI_total_rewards[1],
  upper = CI_total_rewards[2]
)
# Print the CI values
cat("Mean Final Amount:", round(mean_total_rewards, 2), "\n")
## Mean Final Amount: 549.45
cat("95% Confidence Interval: [", round(CI_total_rewards[1], 2), ", ", round(CI_total_rewards[2], 2), "]\n")
## 95% Confidence Interval: [ 541.39 ,  557.51 ]
# Violin plot
ggplot(data, aes(x = "", y = total_rewards)) +
  geom_violin(fill = "green", alpha = 0.4) +
  geom_point(data = total_rewards_ci, aes(y = mean), color = "black", size = 3) + # Correctly reference mean
  geom_errorbar(data = total_rewards_ci, aes(y = mean, ymin = lower, ymax = upper), 
                width = 0.2, color = "red") + # Reference the error bars correctly
  labs(title = "Violin Plot with 95% Confidence Interval for Total Rewards",
       y = "Total Rewards",
       x = "") +
  theme_minimal()

The mean total rewards are recorded at 549.45, with a 95% confidence interval of [541.39, 557.51]. This suggests that we can be 95% confident that the true mean of total rewards earned by customers lies within this specified range. The narrowness of this confidence interval reflects a small variability in the rewards customers receive, which signifies a stable rewards program that consistently delivers benefits to participants. The positive mean total rewards indicate that customers are being incentivized effectively, fostering loyalty and potentially increasing customer retention.

Conclusion: Both response variables, the final amount and total rewards, show favorable averages along with tight confidence intervals, highlighting a well-functioning cashback and rewards system. These insights suggest that the company’s strategies are not only yielding substantial benefits for customers but also encouraging continued engagement and satisfaction with their offerings. The confidence intervals further reinforce the reliability of these findings, providing strong evidence of the positive outcomes of the implemented policies.