Case Study 1: Wine Tasting - Suggestion and Perception

Problem Description

A researcher wants to verify whether the environmental context (suggestion) influences wine evaluation by experts.

Experimental setup:

  • 24 wine experts
  • Same wine served in 3 different environments:
    • Group A: elegant environment (n=8)
    • Group B: neutral environment (n=8)
    • Group C: poor environment (n=8)
  • Each expert assigns a score from 1 to 10

Research question: Does suggestion influence the average wine score?

Data Import

# Set working directory
setwd("C:/Users/User1/Desktop/SIMUL/EX")

# Read data from TXT file
wine_data <- read.table("Ex1_WineTasting.txt", header = TRUE, sep = "\t")

# Check data structure
str(wine_data)
## 'data.frame':    24 obs. of  2 variables:
##  $ group: chr  "A" "A" "A" "A" ...
##  $ score: num  7.2 8.1 7.5 8.3 7.8 8 7.6 7.9 6.5 6.8 ...
head(wine_data)
##   group score
## 1     A   7.2
## 2     A   8.1
## 3     A   7.5
## 4     A   8.3
## 5     A   7.8
## 6     A   8.0
# Summary statistics
summary(wine_data)
##     group               score      
##  Length:24          Min.   :5.400  
##  Class :character   1st Qu.:5.975  
##  Mode  :character   Median :6.550  
##                     Mean   :6.700  
##                     3rd Qu.:7.525  
##                     Max.   :8.300

1. Hypothesis Testing

Null Hypothesis (H₀): Suggestion does not affect wine scores.

  • μ_A = μ_B = μ_C (exchangeability under H₀)
  • The group labels are arbitrary and can be permuted freely.

Alternative Hypothesis (H₁): Suggestion affects wine scores.

  • At least one mean differs: μ_i ≠ μ_j for some i ≠ j

Significance level: α = 0.05

# Display group means to see if there are visible differences
aggregate(score ~ group, data = wine_data, FUN = mean)
##   group score
## 1     A  7.80
## 2     B  6.55
## 3     C  5.75

Group A, the one who tested the wine in an elegant environment, has higher mean score.

2. Permutation Strategy

We permute the group labels across the observed scores.

# Total number of informative permutations
n_total <- (factorial(24) / (factorial(8)^3))
cat("Total permutations:", format(n_total, scientific = TRUE), "\n")
## Total permutations: 9.465512e+09
# We will use Monte Carlo test since n_total > 1 million
n_perms <- 9999
cat("Using Monte Carlo test with", n_perms, "permutations\n")
## Using Monte Carlo test with 9999 permutations

3. Test Statistic

Why ANOVA F-statistic?

We are comparing more than 2 groups, so we can’t use a simple t-test. The F-statistic from one-way ANOVA measures how different the group means are relative to the variability within each group.

F-statistic formula: \[F = \frac{\text{Variance between groups}}{\text{Variance within groups}}\]

  • Large F: group means are very different → evidence against the null hypothesis
  • Small F: group means are similar → consistent with the null hypothesis
# Calculate observed F-statistic using one-way ANOVA
observed_F <- oneway.test(score ~ group, data = wine_data, var.equal = TRUE)$statistic

cat("F-statistic:", observed_F, "\n")
## F-statistic: 104.2674
# The higher the F, the stronger the evidence that groups differ

4. Permutation Test and P-value Calculation

set.seed(123)  # for reproducibility

# Initialize vector to store permuted F-statistics
F_perms <- numeric(n_perms)

# Permutation loop
for (i in 1:n_perms) {
  # Randomly permute the group labels
  permuted_groups <- sample(wine_data$group)
  
  # Calculate F-statistic for this permutation using aov
  perm_aov <- aov(wine_data$score ~ permuted_groups)
  F_perms[i] <- summary(perm_aov)[[1]][1, 4]  # extract F-value
}
rm(i)

# Check distribution of permuted F-statistics
cat("Summary of permuted F-statistics:\n")
## Summary of permuted F-statistics:
print(summary(F_perms))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2810  0.7016  1.1229  1.4727 20.1522
cat("\nObserved F:", observed_F, "\n")
## 
## Observed F: 104.2674
# Calculate empirical p-value
# Count how many permuted F values are >= observed F
# +1 accounts for the observed data itself (avoids p-value = 0)
p_value <- (sum(F_perms >= observed_F) + 1) / (n_perms + 1)

cat("P-value:", p_value, "\n")
## P-value: 1e-04
cat("Number of permutations with F >=", observed_F, ":", sum(F_perms >= observed_F), "\n")
## Number of permutations with F >= 104.2674 : 0

Visualization of Permutation Distribution

# Histogram of permuted F-statistics
hist(F_perms, breaks = 50, col = "lightblue", 
     main = "Permutation Distribution of F-statistic",
     xlab = "F-statistic", 
     las = 1)

# Add vertical line for observed F
abline(v = observed_F, col = "red", lwd = 2, lty = 2)

# Add legend
legend("topright", legend = "Observed F", 
       col = "red", lty = 2, lwd = 2)

6. Conclusions

Based on the permutation test results, we can draw the following conclusions:

Statistical Decision:

The p-value (0.0001) is much smaller than the significance level α = 0.05, therefore we reject the null hypothesis.

Interpretation:

There is strong statistical evidence that environmental suggestion affects wine evaluation scores. The observed F-statistic (104.27) is extremely rare under the assumption of exchangeability: none of the 9999 permutations produced an F-statistic as large as the observed value.

7. Pairwise Comparisons

Since we rejected H₀, we need to investigate where the differences lie. We perform pairwise permutation tests between all group pairs: A vs B, A vs C, B vs C.

Multiple testing problem:

When doing multiple tests, we need to adjust the p-values to control the overall error rate. We use the Bonferroni correction: divide α by the number of comparisons.

  • Number of comparisons: 3
  • Adjusted significance level: α_adjusted = 0.05 / 3 = 0.0167
set.seed(123)
n_perms_pair <- 9999

# --- Comparison A vs B ---
# Subset data
data_AB <- wine_data[wine_data$group %in% c("A", "B"), ]

# Observed difference in means
obs_diff_AB <- abs(mean(data_AB$score[data_AB$group == "A"]) - 
                   mean(data_AB$score[data_AB$group == "B"]))

# Permutation loop
perm_diffs_AB <- numeric(n_perms_pair)
for (i in 1:n_perms_pair) {
  perm_groups <- sample(data_AB$group)
  perm_diffs_AB[i] <- abs(mean(data_AB$score[perm_groups == "A"]) - 
                          mean(data_AB$score[perm_groups == "B"]))
}

# P-value
p_AB <- (sum(perm_diffs_AB >= obs_diff_AB) + 1) / (n_perms_pair + 1)
cat("A vs B: Difference =", round(obs_diff_AB, 3), ", p-value =", p_AB, "\n")
## A vs B: Difference = 1.25 , p-value = 3e-04
# --- Comparison A vs C ---
data_AC <- wine_data[wine_data$group %in% c("A", "C"), ]
obs_diff_AC <- abs(mean(data_AC$score[data_AC$group == "A"]) - 
                   mean(data_AC$score[data_AC$group == "C"]))

# Permutation loop
perm_diffs_AC <- numeric(n_perms_pair)
for (i in 1:n_perms_pair) {
  perm_groups <- sample(data_AC$group)
  perm_diffs_AC[i] <- abs(mean(data_AC$score[perm_groups == "A"]) - 
                          mean(data_AC$score[perm_groups == "C"]))
}

# P-value
p_AC <- (sum(perm_diffs_AC >= obs_diff_AC) + 1) / (n_perms_pair + 1)
cat("A vs C: Difference =", round(obs_diff_AC, 3), ", p-value =", p_AC, "\n")
## A vs C: Difference = 2.05 , p-value = 3e-04
# --- Comparison B vs C ---
data_BC <- wine_data[wine_data$group %in% c("B", "C"), ]
obs_diff_BC <- abs(mean(data_BC$score[data_BC$group == "B"]) - 
                   mean(data_BC$score[data_BC$group == "C"]))

# Permutation loop
perm_diffs_BC <- numeric(n_perms_pair)
for (i in 1:n_perms_pair) {
  perm_groups <- sample(data_BC$group)
  perm_diffs_BC[i] <- abs(mean(data_BC$score[perm_groups == "B"]) - 
                          mean(data_BC$score[perm_groups == "C"]))
}

# P-value
p_BC <- (sum(perm_diffs_BC >= obs_diff_BC) + 1) / (n_perms_pair + 1)
cat("B vs C: Difference =", round(obs_diff_BC, 3), ", p-value =", p_BC, "\n")
## B vs C: Difference = 0.8 , p-value = 3e-04
# Bonferroni-adjusted significance level
cat("\nBonferroni-adjusted α: 0.05 / 3 =", 0.05/3, "\n")
## 
## Bonferroni-adjusted α: 0.05 / 3 = 0.01666667

Results interpretation:

Comparing the p-values with the Bonferroni-adjusted threshold (α = 0.0167):

  • A vs B: The elegant environment produces significantly higher scores than the neutral environment.
  • A vs C: The elegant environment produces significantly higher scores than the poor environment.
  • B vs C: The neutral and poor environments show a significant difference as well.

All three pairwise comparisons are statistically significant after Bonferroni correction, indicating that each environmental context produces distinctly different wine evaluations.

Case Study 3: Aspirine and Bledding time

Problem Description

A researcher wants to test whether aspirin increases bleeding time.

Experimental setup: + 14 individuals + Each person measured before and after taking aspirin + Paired data: same individuals, two measurements

Research question: Does aspirin increase bleeding time?

Data import

# Read data from TXT file
aspirin_data <- read.table("Ex3_Aspirin.txt", 
                           header = TRUE, 
                           sep = "\t")

# Check data structure
head(aspirin_data)
##   before after
## 1    8.2   9.1
## 2    7.5   8.3
## 3    6.8   7.9
## 4    9.1  10.2
## 5    7.3   8.1
## 6    8.0   9.5
str(aspirin_data)
## 'data.frame':    14 obs. of  2 variables:
##  $ before: num  8.2 7.5 6.8 9.1 7.3 8 6.5 7.8 8.5 7.1 ...
##  $ after : num  9.1 8.3 7.9 10.2 8.1 9.5 7.2 8.6 9.8 7.9 ...
# Calculate difference for each individual (after - before)
aspirin_data$difference <- aspirin_data$after - aspirin_data$before

# Display differences
print(aspirin_data[3])
##    difference
## 1         0.9
## 2         0.8
## 3         1.1
## 4         1.1
## 5         0.8
## 6         1.5
## 7         0.7
## 8         0.8
## 9         1.3
## 10        0.8
## 11        1.1
## 12        1.1
## 13        1.2
## 14        1.5

1. Hypothesis Testing

Null Hypothesis (H₀): Aspirin does not affect bleeding time.
- δ = 0 (mean difference = 0) - The differences (after - before) are symmetrically distributed around zero

Alternative Hypothesis (H₁): Aspirin increases bleeding time.
- δ > 0 (mean difference > 0) - This is a one-sided test (we expect positive differences)

Significance level: α = 0.05

# Calculate mean difference
mean_diff <- mean(aspirin_data$difference)
cat("Mean difference (after - before):", round(mean_diff, 3), "\n")
## Mean difference (after - before): 1.05
# Positive mean suggests aspirin increases bleeding time

2. Permutation Strategy

For paired data, we permute the sign of each difference. Under H₀, if there’s no effect, each difference could equally be positive or negative (we could swap “before” and “after” for each individual).

How many permutations?

# Each individual can have their difference changed: + or -
# Total permutations = 2^n where n = number of individuals
n_individuals <- nrow(aspirin_data)
n_total <- 2^n_individuals


cat("Total permutations: ", format(n_total, scientific = FALSE), "\n")
## Total permutations:  16384
# Decision: exact test since 16384 < 1 million
cat("\nUsing exact permutation test\n")
## 
## Using exact permutation test

3. Test Statistic

Which statistic to use?

For paired data, we use the mean of the differences as our test statistic. Since we have only one population (same individuals measured twice), we cannot use difference of means. Instead, we compute the mean of (after - before) values. Under H₀, this should be close to zero.

# Observed test statistic: mean of differences
observed_mean <- mean(aspirin_data$difference)

cat("Observed mean difference:", observed_mean, "\n")
## Observed mean difference: 1.05
# Positive value suggests aspirin increases bleeding time

4. Permutation Test and P-value Calculation

For each permutation, we randomly flip the sign of each difference (+ becomes -, or stays +). This simulates the null hypothesis where the order (before/after) is arbitrary.

set.seed(123)  # for reproducibility

# Total number of permutations
n_perms <- 2^n_individuals

# We'll use a different approach: generate all possible sign combinations

# Initialize vector for permuted means
means_perms <- numeric(n_perms)

# Generate all possible sign flips
for (i in 1:n_perms) {
  # Create random signs: +1 or -1 for each individual
  signs <- sample(c(-1, 1), n_individuals, replace = TRUE)
  
  # Apply signs to differences
  permuted_diff <- aspirin_data$difference * signs
  
  # Calculate mean of permuted differences
  means_perms[i] <- mean(permuted_diff)
}

# Calculate empirical p-value
# Count how many permuted means are >= observed mean
p_value <- (sum(means_perms >= observed_mean) + 1) / (n_perms + 1)
cat("P-value:", p_value, "\n")
## P-value: 0.0001220629
cat("Number of permutations with mean >=", observed_mean, ":", sum(means_perms >= observed_mean), "\n")
## Number of permutations with mean >= 1.05 : 1

Visualization of Permutation Distribution

# Histogram of permuted mean differences
hist(means_perms, breaks = 50, col = "lightblue", 
     main = "Permutation Distribution of Mean Differences",
     xlab = "Mean Difference (after - before)", 
     las = 1)

# Add vertical line for observed mean
abline(v = observed_mean, col = "red", lwd = 2, lty = 2)

# Add vertical line at zero (H₀)
abline(v = 0, col = "black", lwd = 3, lty = 1)

# Add legend
legend("topright", 
       legend = c("Observed Mean"), 
       col = c("red"), 
       lty = c(2, 3), 
       lwd = c(2, 1))

6. Conclusions

Based on the permutation test results, we can draw the following conclusions:

Statistical Decision:

The p-value obtained from the exact permutation test is compared with the significance level α = 0.05. If p-value ≤ 0.05, we reject H₀.

Interpretation:

Since this is a one-sided test, we are specifically testing whether aspirin increases bleeding time. The permutation distribution shows us how likely it would be to observe our mean difference (or larger) if aspirin had no effect.

If the observed mean difference is significantly larger than what we would expect under random sign flips, this provides evidence that aspirin genuinely increases bleeding time rather than the differences being due to chance.

alpha <- 0.05

cat("P-value:", p_value, "\n")
## P-value: 0.0001220629
cat("Observed mean difference:", round(observed_mean, 3), "\n")
## Observed mean difference: 1.05
if (p_value <= alpha) {
  cat("\nDecision: Reject H₀\n")
  cat("Conclusion: Aspirin increases bleeding time.\n")
} else {
  cat("\nDecision: Do not reject H₀\n")
  cat("Conclusion: No evidence that aspirin increases bleeding time.\n")
}
## 
## Decision: Reject H₀
## Conclusion: Aspirin increases bleeding time.