A researcher wants to verify whether the environmental context (suggestion) influences wine evaluation by experts.
Experimental setup:
Research question: Does suggestion influence the average wine score?
# Set working directory
setwd("C:/Users/User1/Desktop/SIMUL/EX")
# Read data from TXT file
wine_data <- read.table("Ex1_WineTasting.txt", header = TRUE, sep = "\t")
# Check data structure
str(wine_data)
## 'data.frame': 24 obs. of 2 variables:
## $ group: chr "A" "A" "A" "A" ...
## $ score: num 7.2 8.1 7.5 8.3 7.8 8 7.6 7.9 6.5 6.8 ...
head(wine_data)
## group score
## 1 A 7.2
## 2 A 8.1
## 3 A 7.5
## 4 A 8.3
## 5 A 7.8
## 6 A 8.0
# Summary statistics
summary(wine_data)
## group score
## Length:24 Min. :5.400
## Class :character 1st Qu.:5.975
## Mode :character Median :6.550
## Mean :6.700
## 3rd Qu.:7.525
## Max. :8.300
Null Hypothesis (H₀): Suggestion does not affect wine scores.
Alternative Hypothesis (H₁): Suggestion affects wine scores.
Significance level: α = 0.05
# Display group means to see if there are visible differences
aggregate(score ~ group, data = wine_data, FUN = mean)
## group score
## 1 A 7.80
## 2 B 6.55
## 3 C 5.75
Group A, the one who tested the wine in an elegant environment, has higher mean score.
We permute the group labels across the observed scores.
# Total number of informative permutations
n_total <- (factorial(24) / (factorial(8)^3))
cat("Total permutations:", format(n_total, scientific = TRUE), "\n")
## Total permutations: 9.465512e+09
# We will use Monte Carlo test since n_total > 1 million
n_perms <- 9999
cat("Using Monte Carlo test with", n_perms, "permutations\n")
## Using Monte Carlo test with 9999 permutations
Why ANOVA F-statistic?
We are comparing more than 2 groups, so we can’t use a simple t-test. The F-statistic from one-way ANOVA measures how different the group means are relative to the variability within each group.
F-statistic formula: \[F = \frac{\text{Variance between groups}}{\text{Variance within groups}}\]
# Calculate observed F-statistic using one-way ANOVA
observed_F <- oneway.test(score ~ group, data = wine_data, var.equal = TRUE)$statistic
cat("F-statistic:", observed_F, "\n")
## F-statistic: 104.2674
# The higher the F, the stronger the evidence that groups differ
set.seed(123) # for reproducibility
# Initialize vector to store permuted F-statistics
F_perms <- numeric(n_perms)
# Permutation loop
for (i in 1:n_perms) {
# Randomly permute the group labels
permuted_groups <- sample(wine_data$group)
# Calculate F-statistic for this permutation using aov
perm_aov <- aov(wine_data$score ~ permuted_groups)
F_perms[i] <- summary(perm_aov)[[1]][1, 4] # extract F-value
}
rm(i)
# Check distribution of permuted F-statistics
cat("Summary of permuted F-statistics:\n")
## Summary of permuted F-statistics:
print(summary(F_perms))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2810 0.7016 1.1229 1.4727 20.1522
cat("\nObserved F:", observed_F, "\n")
##
## Observed F: 104.2674
# Calculate empirical p-value
# Count how many permuted F values are >= observed F
# +1 accounts for the observed data itself (avoids p-value = 0)
p_value <- (sum(F_perms >= observed_F) + 1) / (n_perms + 1)
cat("P-value:", p_value, "\n")
## P-value: 1e-04
cat("Number of permutations with F >=", observed_F, ":", sum(F_perms >= observed_F), "\n")
## Number of permutations with F >= 104.2674 : 0
# Histogram of permuted F-statistics
hist(F_perms, breaks = 50, col = "lightblue",
main = "Permutation Distribution of F-statistic",
xlab = "F-statistic",
las = 1)
# Add vertical line for observed F
abline(v = observed_F, col = "red", lwd = 2, lty = 2)
# Add legend
legend("topright", legend = "Observed F",
col = "red", lty = 2, lwd = 2)
Based on the permutation test results, we can draw the following conclusions:
Statistical Decision:
The p-value (0.0001) is much smaller than the significance level α = 0.05, therefore we reject the null hypothesis.
Interpretation:
There is strong statistical evidence that environmental suggestion affects wine evaluation scores. The observed F-statistic (104.27) is extremely rare under the assumption of exchangeability: none of the 9999 permutations produced an F-statistic as large as the observed value.
Since we rejected H₀, we need to investigate where the differences lie. We perform pairwise permutation tests between all group pairs: A vs B, A vs C, B vs C.
Multiple testing problem:
When doing multiple tests, we need to adjust the p-values to control the overall error rate. We use the Bonferroni correction: divide α by the number of comparisons.
set.seed(123)
n_perms_pair <- 9999
# --- Comparison A vs B ---
# Subset data
data_AB <- wine_data[wine_data$group %in% c("A", "B"), ]
# Observed difference in means
obs_diff_AB <- abs(mean(data_AB$score[data_AB$group == "A"]) -
mean(data_AB$score[data_AB$group == "B"]))
# Permutation loop
perm_diffs_AB <- numeric(n_perms_pair)
for (i in 1:n_perms_pair) {
perm_groups <- sample(data_AB$group)
perm_diffs_AB[i] <- abs(mean(data_AB$score[perm_groups == "A"]) -
mean(data_AB$score[perm_groups == "B"]))
}
# P-value
p_AB <- (sum(perm_diffs_AB >= obs_diff_AB) + 1) / (n_perms_pair + 1)
cat("A vs B: Difference =", round(obs_diff_AB, 3), ", p-value =", p_AB, "\n")
## A vs B: Difference = 1.25 , p-value = 3e-04
# --- Comparison A vs C ---
data_AC <- wine_data[wine_data$group %in% c("A", "C"), ]
obs_diff_AC <- abs(mean(data_AC$score[data_AC$group == "A"]) -
mean(data_AC$score[data_AC$group == "C"]))
# Permutation loop
perm_diffs_AC <- numeric(n_perms_pair)
for (i in 1:n_perms_pair) {
perm_groups <- sample(data_AC$group)
perm_diffs_AC[i] <- abs(mean(data_AC$score[perm_groups == "A"]) -
mean(data_AC$score[perm_groups == "C"]))
}
# P-value
p_AC <- (sum(perm_diffs_AC >= obs_diff_AC) + 1) / (n_perms_pair + 1)
cat("A vs C: Difference =", round(obs_diff_AC, 3), ", p-value =", p_AC, "\n")
## A vs C: Difference = 2.05 , p-value = 3e-04
# --- Comparison B vs C ---
data_BC <- wine_data[wine_data$group %in% c("B", "C"), ]
obs_diff_BC <- abs(mean(data_BC$score[data_BC$group == "B"]) -
mean(data_BC$score[data_BC$group == "C"]))
# Permutation loop
perm_diffs_BC <- numeric(n_perms_pair)
for (i in 1:n_perms_pair) {
perm_groups <- sample(data_BC$group)
perm_diffs_BC[i] <- abs(mean(data_BC$score[perm_groups == "B"]) -
mean(data_BC$score[perm_groups == "C"]))
}
# P-value
p_BC <- (sum(perm_diffs_BC >= obs_diff_BC) + 1) / (n_perms_pair + 1)
cat("B vs C: Difference =", round(obs_diff_BC, 3), ", p-value =", p_BC, "\n")
## B vs C: Difference = 0.8 , p-value = 3e-04
# Bonferroni-adjusted significance level
cat("\nBonferroni-adjusted α: 0.05 / 3 =", 0.05/3, "\n")
##
## Bonferroni-adjusted α: 0.05 / 3 = 0.01666667
Results interpretation:
Comparing the p-values with the Bonferroni-adjusted threshold (α = 0.0167):
All three pairwise comparisons are statistically significant after Bonferroni correction, indicating that each environmental context produces distinctly different wine evaluations.
A researcher wants to test whether aspirin increases bleeding time.
Experimental setup: + 14 individuals + Each person measured before and after taking aspirin + Paired data: same individuals, two measurements
Research question: Does aspirin increase bleeding time?
# Read data from TXT file
aspirin_data <- read.table("Ex3_Aspirin.txt",
header = TRUE,
sep = "\t")
# Check data structure
head(aspirin_data)
## before after
## 1 8.2 9.1
## 2 7.5 8.3
## 3 6.8 7.9
## 4 9.1 10.2
## 5 7.3 8.1
## 6 8.0 9.5
str(aspirin_data)
## 'data.frame': 14 obs. of 2 variables:
## $ before: num 8.2 7.5 6.8 9.1 7.3 8 6.5 7.8 8.5 7.1 ...
## $ after : num 9.1 8.3 7.9 10.2 8.1 9.5 7.2 8.6 9.8 7.9 ...
# Calculate difference for each individual (after - before)
aspirin_data$difference <- aspirin_data$after - aspirin_data$before
# Display differences
print(aspirin_data[3])
## difference
## 1 0.9
## 2 0.8
## 3 1.1
## 4 1.1
## 5 0.8
## 6 1.5
## 7 0.7
## 8 0.8
## 9 1.3
## 10 0.8
## 11 1.1
## 12 1.1
## 13 1.2
## 14 1.5
Null Hypothesis (H₀): Aspirin does not affect
bleeding time.
- δ = 0 (mean difference = 0) - The differences (after - before) are
symmetrically distributed around zero
Alternative Hypothesis (H₁): Aspirin increases
bleeding time.
- δ > 0 (mean difference > 0) - This is a one-sided
test (we expect positive differences)
Significance level: α = 0.05
# Calculate mean difference
mean_diff <- mean(aspirin_data$difference)
cat("Mean difference (after - before):", round(mean_diff, 3), "\n")
## Mean difference (after - before): 1.05
# Positive mean suggests aspirin increases bleeding time
For paired data, we permute the sign of each difference. Under H₀, if there’s no effect, each difference could equally be positive or negative (we could swap “before” and “after” for each individual).
How many permutations?
# Each individual can have their difference changed: + or -
# Total permutations = 2^n where n = number of individuals
n_individuals <- nrow(aspirin_data)
n_total <- 2^n_individuals
cat("Total permutations: ", format(n_total, scientific = FALSE), "\n")
## Total permutations: 16384
# Decision: exact test since 16384 < 1 million
cat("\nUsing exact permutation test\n")
##
## Using exact permutation test
Which statistic to use?
For paired data, we use the mean of the differences as our test statistic. Since we have only one population (same individuals measured twice), we cannot use difference of means. Instead, we compute the mean of (after - before) values. Under H₀, this should be close to zero.
# Observed test statistic: mean of differences
observed_mean <- mean(aspirin_data$difference)
cat("Observed mean difference:", observed_mean, "\n")
## Observed mean difference: 1.05
# Positive value suggests aspirin increases bleeding time
For each permutation, we randomly flip the sign of each difference (+ becomes -, or stays +). This simulates the null hypothesis where the order (before/after) is arbitrary.
set.seed(123) # for reproducibility
# Total number of permutations
n_perms <- 2^n_individuals
# We'll use a different approach: generate all possible sign combinations
# Initialize vector for permuted means
means_perms <- numeric(n_perms)
# Generate all possible sign flips
for (i in 1:n_perms) {
# Create random signs: +1 or -1 for each individual
signs <- sample(c(-1, 1), n_individuals, replace = TRUE)
# Apply signs to differences
permuted_diff <- aspirin_data$difference * signs
# Calculate mean of permuted differences
means_perms[i] <- mean(permuted_diff)
}
# Calculate empirical p-value
# Count how many permuted means are >= observed mean
p_value <- (sum(means_perms >= observed_mean) + 1) / (n_perms + 1)
cat("P-value:", p_value, "\n")
## P-value: 0.0001220629
cat("Number of permutations with mean >=", observed_mean, ":", sum(means_perms >= observed_mean), "\n")
## Number of permutations with mean >= 1.05 : 1
# Histogram of permuted mean differences
hist(means_perms, breaks = 50, col = "lightblue",
main = "Permutation Distribution of Mean Differences",
xlab = "Mean Difference (after - before)",
las = 1)
# Add vertical line for observed mean
abline(v = observed_mean, col = "red", lwd = 2, lty = 2)
# Add vertical line at zero (H₀)
abline(v = 0, col = "black", lwd = 3, lty = 1)
# Add legend
legend("topright",
legend = c("Observed Mean"),
col = c("red"),
lty = c(2, 3),
lwd = c(2, 1))
Based on the permutation test results, we can draw the following conclusions:
Statistical Decision:
The p-value obtained from the exact permutation test is compared with the significance level α = 0.05. If p-value ≤ 0.05, we reject H₀.
Interpretation:
Since this is a one-sided test, we are specifically testing whether aspirin increases bleeding time. The permutation distribution shows us how likely it would be to observe our mean difference (or larger) if aspirin had no effect.
If the observed mean difference is significantly larger than what we would expect under random sign flips, this provides evidence that aspirin genuinely increases bleeding time rather than the differences being due to chance.
alpha <- 0.05
cat("P-value:", p_value, "\n")
## P-value: 0.0001220629
cat("Observed mean difference:", round(observed_mean, 3), "\n")
## Observed mean difference: 1.05
if (p_value <= alpha) {
cat("\nDecision: Reject H₀\n")
cat("Conclusion: Aspirin increases bleeding time.\n")
} else {
cat("\nDecision: Do not reject H₀\n")
cat("Conclusion: No evidence that aspirin increases bleeding time.\n")
}
##
## Decision: Reject H₀
## Conclusion: Aspirin increases bleeding time.