Discrimination sensory tests are designed to determine whether a perceivable difference exists between two products. Unlike descriptive or affective tests, the goal is not to describe or rate the intensity of attributes, but simply to assess whether a difference can be detected by trained or untrained panelists. Common types of Discrimination tests include Triangle Test, Duo-Trio Test, 2-Alternative Forced Choice (2-AFC), 3-AFC, Tetrad, and more (International Organization for Standardization 2004).
Statistical analysis of the results is based on probability theory, using tools like the binomial distribution, chi-squared tests, or z-tests for proportions. For standard test formats, specialized functions in the sensR package streamline the analysis process.
The triangle test is a discrimination sensory method in which each panelist receives three samples. Two are the same and one is different. The panelist must identify the odd sample.
We simulate responses from 30 panelists for illustration (1 = correct, 0 = incorrect).
set.seed(123)
Triangle <- data.frame(Correct_answer = rbinom(30, 1, prob = 0.5))
Used to compare the observed number of correct answers to the chance level (1/3 for triangle test).
data <- data.frame(answer = Triangle$Correct_answer)
binom.test(sum(data$answer), nrow(data), p = 1/3)
##
## Exact binomial test
##
## data: sum(data$answer) and nrow(data)
## number of successes = 19, number of trials = 30, p-value = 0.0008206
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 95 percent confidence interval:
## 0.4385598 0.8007014
## sample estimates:
## probability of success
## 0.6333333
Compares observed frequencies to the expected distribution.
observed <- table(factor(data$answer, levels = c(0,1)))
names(observed) <- c("Incorrect", "Correct")
expected <- c("Incorrect" = length(data$answer) * 2 / 3,
"Correct" = length(data$answer) * 1 / 3)
chisq.test(x = observed, p = expected / sum(expected))
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 12.15, df = 1, p-value = 0.0004909
Tests whether the observed proportion differs from the expected chance level.
observed_prob <- mean(data$answer)
expected_prob <- 1/3
n <- length(data$answer)
z <- (observed_prob - expected_prob) / sqrt(expected_prob * (1 - expected_prob) / n)
p_value <- 2 * pnorm(-abs(z)) # Two-tailed test
z
## [1] 3.485685
p_value
## [1] 0.0004908786
While the binomial test, z-test for proportions, and chi-squared test all aim to determine whether the observed number of correct responses significantly differs from what is expected by chance, their sensitivity can vary. In most cases, the binomial test is preferred for sensory discrimination tests due to its exact nature, especially with small to moderate sample sizes. The z-test provides a good approximation when the sample size is large enough, while the chi-squared test may be less reliable with only two response categories or low expected frequencies. Additionally, the Yates continuity correction can be applied to the chi-squared test to reduce bias in small samples, although it may lead to more conservative results (Hernández Montes 2016).
The Yates continuity correction is commonly applied in the chi-squared test with 2 categories (2x1 or 2x2 tables) to adjust for the overestimation of significance due to approximation errors (Agresti 2018).
chisq.test(x = observed, p = expected / sum(expected), correct = TRUE)
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 12.15, df = 1, p-value = 0.0004909
✅ Use when you have small sample sizes and only two response categories
⚠️ Can make the test more conservative, possibly increasing the p-value
❌ Not needed when using the exact binomial test
The same statistical logic used for the triangle test can be applied to other discrimination tests such as:
Duo-Trio (chance = 1/2)
Tetrad (chance = 1/3)
Hexad (chance = 1/5)
Two-out-of-Five (2-of-5) (chance = 0.206)
To analyze these tests, simply change the value of the chance probability (p) in the binomial test, z-test, and chi-squared test accordingly.
Example: Two-out-of-Five Test Let’s say we tested 25 panelists, and 9 of them correctly identified the two matching samples.
# Observed responses
observed_2of5 <- c(Incorrect = 25 - 9, Correct = 9)
# Expected frequencies under null hypothesis (p ≈ 0.206)
expected_2of5 <- c(Incorrect = 25 * (1 - 0.206),
Correct = 25 * 0.206)
# Chi-squared test
chisq.test(x = observed_2of5, p = expected_2of5 / sum(expected_2of5))
##
## Chi-squared test for given probabilities
##
## data: observed_2of5
## X-squared = 3.6249, df = 1, p-value = 0.05692
This approach allows you to evaluate various test types by modifying only the expected probabilities, making it easy to adapt the analysis for any forced-choice sensory discrimination method.
The discrim() function from the sensR package provides a unified and flexible interface for analyzing a wide variety of discrimination protocols. It allows you to test hypotheses about sensory differences using different methods, test types, and statistical frameworks—all with one function.
library(sensR)
discrim_result <- discrim(19, 30, method = "triangle")
print(discrim_result)
##
## Estimates for the triangle discrimination protocol with 19 correct
## answers in 30 trials. One-sided p-value and 95 % two-sided confidence
## intervals are based on the 'exact' binomial test.
##
## Estimate Std. Error Lower Upper
## pc 0.6333 0.08798 0.4386 0.8007
## pd 0.4500 0.13197 0.1578 0.7011
## d-prime 2.1462 0.45527 1.1264 3.1336
##
## Result of difference test:
## 'exact' binomial test: p-value = 0.0007371
## Alternative hypothesis: d-prime is greater than 0
This function internally accounts for the structure of each test (such as chance level, forced-choice nature, and d-prime calculations), making it ideal for standardized analysis and reporting across different test types.
Most sensory discrimination tests are designed to detect differences between products. In such tests, the focus is on minimizing the Type I error (α-risk)—the probability of concluding that a perceptible difference exists when in fact there is none. This approach assumes that Type II error (β-risk) and the proportion of distinguishers (pd) are either negligible or unimportant. Consequently, sample sizes can be kept relatively small.
However, in many industrial and quality control applications, the goal is not to prove that products are different, but rather that they are similar enough to be used interchangeably—for example, when switching suppliers or reformulating for cost savings.
In similarity testing, the focus shifts. The analyst must define what constitutes a meaningful difference by specifying a value for pd, and then chooses a small value for β-risk (Type II error) to ensure the test has high power to detect differences if they exist. In this case, a larger α-risk is tolerated to avoid requiring an excessively large number of assessors (Meilgaard, Civille, and Carr 2016).
library(sensR)
# Let's assume we want to demonstrate similarity (not difference)
# pd = 0.30 is the threshold for a meaningful difference (i.e., no more than 30% distinguishers)
# test = "similarity" enables the appropriate hypothesis test
discrim_sim <- discrim(correct = 17,
total = 30,
pd0 = 0.30,
method = "triangle",
test = "similarity",
statistic = "exact")
print(discrim_sim)
##
## Estimates for the triangle discrimination protocol with 17 correct
## answers in 30 trials. One-sided p-value and 95 % two-sided confidence
## intervals are based on the 'exact' binomial test.
##
## Estimate Std. Error Lower Upper
## pc 0.5667 0.09047 0.37427 0.7454
## pd 0.3500 0.13571 0.06141 0.6181
## d-prime 1.8071 0.45652 0.68033 2.7691
##
## Result of similarity test:
## 'exact' binomial test: p-value = 0.7071
## Alternative hypothesis: pd is less than 0.3
This test evaluates whether the observed proportion of correct responses is low enough to conclude that no meaningful perceptible difference exists between the two samples, given the defined threshold pd0.