Data Dive — Hypothesis Testing

Hypothesis 1: The Distribution of Subcategories within the ‘Fruits & Veggies’ Category is Uniform

Hypothesis:
Null Hypothesis : The distribution of subcategories within the “Fruits & Veggies” category is uniform.
Alternative Hypothesis : The subcategory “Fresh Fruits” has a different frequency compared to other subcategories within the “Fruits & Veggies” category.

Alpha Level (α = 0.05): Choosing an alpha level of 0.05 as it is most widely used in hypothesis testing.
A higher alpha level increases the likelihood of finding a significant result, but it also increases the risk of a Type I error, hence we are sticking to 0.05.
Power Level (0.8): In the context of supermarket grocery sales, where we are interested in identifying meaningful patterns, trends, or differences, having a reasonable chance of detecting a true effect is crucial for the validity and relevance of our findings. Also, power level is intricately linked to Type I (false positive) and Type II (false negative) errors. Therefore, aiming for a power of 0.80 strikes a balance, as it implies an acceptable level of risk for Type II errors (20% chance of not detecting a true effect), which is often deemed reasonable in practice.
Minimum Effect Size (0.1): 0.1 is chosen because even a relatively small difference in frequency of “Fresh Fruits” compared to other subcategories could be considered practically meaningful. If the effect size were too large, it might require a substantial difference in frequency to be considered practically important. By setting a minimum effect size of 0.1, we are focusing on detecting differences that, while small, could have significance in real-world decision-making.

Performing a Neyman-Pearson hypothesis test

Hypothesis:
H0: The distribution of subcategories within the “Fruits & Veggies” category is uniform.
H1: The subcategory “Fresh Fruits” has a different frequency compared to other subcategories within the “Fruits & Veggies” category.

Using Chi-square test as the data is categorical, with subcategories falling into different groups.

Values:

Alpha Level (α): 0.05
Power Level: Not applicable for chi-square tests.
Minimum Effect Size: Not applicable for chi-square tests.

df_fruits_veggies <- data[data$Category == 'Fruits & Veggies', ]

# Creating a contingency table
contingency_table <- table(df_fruits_veggies$SubCategory)

# Calculating the total number of observations
total_obs <- sum(contingency_table)

# Calculating the expected proportions under the null hypothesis
expected_proportions <- rep(1/length(contingency_table), length(contingency_table))

# Performing the chi-square test
chi_square_result <- chisq.test(contingency_table, p = expected_proportions)
print(chi_square_result)

## 
##  Chi-squared test for given probabilities
## 
## data:  contingency_table
## X-squared = 0.87165, df = 3, p-value = 0.8323

The p-value obtained is 0.8323, which is quite high. Given this high p-value,

We fail to reject the null hypothesis. There is not enough evidence to conclude that the distribution of subcategories within the “Fruits & Veggies” category is significantly different from what would be expected under the assumption of independence. In simpler terms, there is no significant association between the subcategories in terms of their frequencies.

Performing a Fisher’s style test for significance on the same hypothesis

df_fruits_veggies <- data[data$Category == 'Fruits & Veggies', ]

# Creating a 2x2 contingency table
contingency_table <- table(factor(df_fruits_veggies$SubCategory, levels = c("Fresh Fruits", levels(df_fruits_veggies$SubCategory))))

# Performing Fisher's exact test
fisher_test_result <- fisher.test(matrix(c(contingency_table, rep(0, 3)), ncol = 2))

print(contingency_table)

## 
## Fresh Fruits 
##          369

print(fisher_test_result)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  matrix(c(contingency_table, rep(0, 3)), ncol = 2)
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    0 Inf
## sample estimates:
## odds ratio 
##          0

alpha <- 0.05
if (fisher_test_result$p.value < alpha) {
  print("Reject the null hypothesis. There is evidence that 'Fresh Fruits' has a different frequency than other subcategories.")
} else {
  print("Fail to reject the null hypothesis. The data does not provide enough evidence to conclude that 'Fresh Fruits' has a different frequency than other subcategories.")
}

## [1] "Fail to reject the null hypothesis. The data does not provide enough evidence to conclude that 'Fresh Fruits' has a different frequency than other subcategories."

-The table shows that there are 369 occurrences of the subcategory “Fresh Fruits” within the “Fruits & Veggies” category.
-The p-value is 1, which is greater than the chosen significance level (alpha = 0.05).
- Since the p-value is greater than alpha (1 > 0.05), we fail to reject the null hypothesis. There is not enough evidence to conclude that the frequency of “Fresh Fruits” is different from the other subcategories within the “Fruits & Veggies” category based on Fisher’s exact test.

In simpler terms, the test did not find a significant difference in the frequency of “Fresh Fruits” compared to other subcategories in the “Fruits & Veggies” category. The results suggest that the observed frequencies are consistent with what would be expected under the assumption of no difference.

Visualization to illustrate the results of hypothesis 1

library(ggplot2)

# Subset the data for the relevant category
df_fruits_veggies <- data[data$Category == 'Fruits & Veggies', ]

# Create a bar plot
ggplot(df_fruits_veggies, aes(x = SubCategory, fill = SubCategory)) +
  geom_bar() +
  theme_minimal() +
  labs(title = "Distribution of Subcategories in 'Fruits & Veggies'",
       x = "Subcategory",
       y = "Frequency") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

- The p-values from Chi-squared and Fisher’s Exact Test tests were high (0.8323 and 1, respectively). High p-values suggest that, based on the data we have, there isn’t strong evidence to say “Fresh Fruits” has a significantly different frequency compared to other subcategories.

- But, the bar plot visually shows that the bar for “Fresh Fruits” is a bit higher than others. This suggests, at a glance, that “Fresh Fruits” might be more frequent.

Interpretation:

The statistical tests are conservative and might not always detect small differences.
Even though the bar plot hints at a difference, the statistical tests emphasize caution.
The decision not to reject the null hypothesis is based on the overall evidence.

To put it in simpler terms, the tests are looking for an evidence and they didn’t find any to confidently say “Fresh Fruits” is different. The visual might suggest a difference, but it’s not strong enough to make a firm conclusion. So, for now, we don’t have enough proof to say “Fresh Fruits” is significantly different from others.

To increase the evidence and get a stronger conclusion that “Fresh Fruits” is significantly different from others, we can do the following:
- Collect data from a larger sample if possible. More data can provide a clearer picture and increase the power of statistical tests.
- Conduct the same analysis on different datasets or subsets to see if the pattern holds consistently.
- Investigate if there are other statistical tests more suited to our specific data distribution.

Hypothesis 2: The Central region has a higher average sales per order compared to all other regions.

Alpha Level (α = 0.05): Choosing an alpha level of 0.05 as it is most widely used in hypothesis testing.
A higher alpha level increases the likelihood of finding a significant result, but it also increases the risk of a Type I error, hence we are sticking to 0.05.
Power Level (0.8): In the context of supermarket grocery sales, where we are interested in identifying meaningful patterns, trends, or differences, having a reasonable chance of detecting a true effect is crucial for the validity and relevance of our findings. Also, power level is intricately linked to Type I (false positive) and Type II (false negative) errors. Therefore, aiming for a power of 0.80 strikes a balance, as it implies an acceptable level of risk for Type II errors (20% chance of not detecting a true effect), which is often deemed reasonable in practice.
Minimum Effect Size (0.1): For the average sales per order across regions, a minimum effect size of 0.1 is chosen to identify a difference that is small but practically meaningful. This acknowledges that even a modest difference in average sales per order might have operational or strategic significance for decision-makers. Setting a smaller effect size allows us to capture subtle variations that, while not large in magnitude, could still be relevant in a business context.

Performing a Neyman-Pearson hypothesis test

Hypothesis:

H0: The average sales per order in the Central region is not significantly different from the average sales per order in all other regions combined.

H1: The average sales per order in the Central region is significantly higher than the average sales per order in all other regions combined.

Using a two-sample t-test as the data involves comparing the means of two independent groups. In this case, we are comparing the average sales per order in the Central region to the average sales per order in other regions.

Values:

Alpha Level (α): 0.05
Power Level: Not applicable for this t-test example.
Minimum Effect Size: Not applicable for this t-test example.

df_central <- data[data$Region == 'Central', ]
df_other_regions <- data[data$Region != 'Central', ]

# Performing a t-test
t_test_result <- t.test(df_central$Sales, df_other_regions$Sales)
print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  df_central$Sales and df_other_regions$Sales
## t = -0.34669, df = 3847.8, p-value = 0.7288
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -31.49041  22.02696
## sample estimates:
## mean of x mean of y 
##  1492.964  1497.696

# Interpreting the results
alpha <- 0.05
if (t_test_result$p.value < alpha) {
  print("Reject the null hypothesis. There is evidence that the Central region has a higher average sales per order.")
} else {
  print("Fail to reject the null hypothesis. The data does not provide enough evidence to conclude that the Central region has a higher average sales per order.")
}

## [1] "Fail to reject the null hypothesis. The data does not provide enough evidence to conclude that the Central region has a higher average sales per order."

The results indicate a t-value of -0.34669 with degrees of freedom (df) approximately equal to 3847.8. The p-value associated with this test is 0.7288. - With a p-value of 0.7288, which is greater than the chosen significance level (α=0.05), we fail to reject the null hypothesis. Therefore, based on the data, there is not enough evidence to conclude that the Central region has a higher average sales per order compared to other regions. - Also, the 95% confidence interval for the difference in means is (−31.49041,22.02696). Since this interval includes zero, it further supports the conclusion that there is no statistically significant difference in average sales per order between the Central region and other regions.

This analysis provides insight into the sales data, suggesting that the Central region does not exhibit a significantly different average sales per order compared to other regions.

Performing a Fisher’s style test for significance on the same hypothesis

# Performing ANOVA
anova_result <- aov(Sales ~ Region, data = data)
print(summary(anova_result))

##               Df    Sum Sq Mean Sq F value Pr(>F)
## Region         4 3.545e+05   88631   0.266    0.9
## Residuals   9989 3.333e+09  333673

# Interpreting the results
alpha <- 0.05
p_value <- summary(anova_result)[[1]]$`Pr(>F)`[1]

if (p_value < alpha) {
  print("Reject the null hypothesis. There is evidence that at least one region has a different average sales per order.")
} else {
  print("Fail to reject the null hypothesis. The data does not provide enough evidence to conclude that the average sales per order is different across regions.")
}

## [1] "Fail to reject the null hypothesis. The data does not provide enough evidence to conclude that the average sales per order is different across regions."

The F-statistic and p-value are components of the ANOVA, which is a type of Fisher’s F-test used for comparing means.

Interpretation:

Fail to reject the null hypothesis - There is not enough evidence to conclude that the average sales per order is different across regions.

The analysis suggests that there is no significant difference in average sales per order among the different regions. The p-value being high(0.9) indicates that the observed differences could likely be due to random variation, and we don’t have enough evidence to say that the regions have distinct average sales.

Visualization to illustrate the results of hypothesis 2

library(ggplot2)

df_central <- data[data$Region == 'Central', ]
df_other_regions <- data[data$Region != 'Central', ]

# Creating a box plot
ggplot(data, aes(x = Region, y = Sales, fill = Region)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Comparison of Average Sales per Order by Region",
       x = "Region",
       y = "Sales") +
  theme(legend.position = "none")

-The lack of variation in the box plots indicates that the sales data is relatively consistent across all regions.
- In the context of our hypothesis (“The Central region has a higher average sales per order compared to all other regions”), the visual representation doesn’t provide strong evidence supporting this claim. The absence of distinct patterns or variations in sales across regions, as shown by similar box plots, suggests that the average sales per order might not significantly differ among the regions.
- It aligns with the statistical tests and interpretations we performed earlier, where we failed to reject the null hypothesis, indicating a lack of sufficient evidence to conclude that the average sales per order is different across regions.

To increase the evidence and get a stronger conclusions, we can do the following:
- Collect data from a larger sample if possible. More data can provide a clearer picture and increase the power of statistical tests.
- Conduct the same analysis on different datasets or subsets to see if the pattern holds consistently.
- Investigate if there are other statistical tests more suited to our specific data distribution.

Data Dive — Hypothesis Testing

2024-02-23

Loading the ‘Supermart’ CSV file located on desktop

Hypothesis 1: The Distribution of Subcategories within the ‘Fruits & Veggies’ Category is Uniform

Performing a Neyman-Pearson hypothesis test

Performing a Fisher’s style test for significance on the same hypothesis

Visualization to illustrate the results of hypothesis 1

Hypothesis 2: The Central region has a higher average sales per order compared to all other regions.

Performing a Neyman-Pearson hypothesis test

Performing a Fisher’s style test for significance on the same hypothesis

Visualization to illustrate the results of hypothesis 2