MATH 454: NON-PARAMETRIC METHODS
1 Introduction to Non-parametric Methods
In the theory of Estimation and testing of hypothesis, we assume that observations which are available to the statistician come from distribution for which the exact form is known even though the values of some parameters are unknown.
In many of the problems to be discussed in non-parametric method we shall not assume that the available observations come from a particular parametric family or distribution. We study inferences that can be made about the distribution from which the observations come, without making special assumption about the form of that distribution.
Non-parametric statistics are methods that make few or no assumptions about the underlying population distribution from which the data are sampled (e.g., no assumption of a normal distribution).
They are “distribution-free” methods.
1.1 Key Characteristics
Flexibility: They work with data that isn’t normally distributed, is ordinal (ranked), or has outliers.
Focus on Ranks: Many non-parametric tests work by ranking the data rather than using the raw data values themselves.
Robustness: They are less affected by violations of assumptions that would invalidate parametric tests.
1.2 When to Use Them
When your data is not normally distributed.
When you have a small sample size.
When your data is on an ordinal scale (e.g., rankings, Likert scales).
When your data contains significant outliers.
1.3 Overview
Statistical tests can be broadly classified into parametric and non-parametric tests. Parametric tests assume underlying statistical distributions (e.g., normality), while non-parametric tests make fewer assumptions and are often used when data doesn’t meet parametric criteria.
1.4 Comparison Table
| Parametric.Test..Makes.Assumptions. | Non.Parametric.Equivalent..Fewer.Assumptions. |
|---|---|
| Independent t-test (compares 2 means) | Mann-Whitney U Test (compares 2 medians) |
| Paired t-test (compares means from same group) | Wilcoxon Signed-Rank Test |
| One-Way ANOVA (compares 3+ means) | Kruskal-Wallis H Test |
| Pearson’s Correlation (linear relationship) | Spearman’s Rank Correlation |
1.5 Definition of key concept in Non-parametric methods
1. Distribution-Free
The core idea. Methods do not assume the data follows a specific probability distribution (like the normal distribution). This makes them more flexible for real-world data.
2. Ranks
A fundamental data transformation. Instead of using the raw data values, observations are sorted and replaced by their position (rank). This minimizes the effect of outliers and non-normal shapes.
3. Ordinal Data
The type of data these methods often use. Data that can be ordered or ranked (e.g., 1st, 2nd, 3rd; “strongly agree” to “strongly disagree”), but the intervals between points are not necessarily equal.
4. Robustness
A key advantage. Non-parametric methods are less influenced by outliers or violations of common statistical assumptions. They provide reliable results even when the data is “messy.”
5. Median (not Mean)
The typical measure of central tendency. Since they don’t assume a symmetric distribution, non-parametric methods usually test hypotheses about the median rather than the mean.
2 The Sign Test
The Sign Test is a non-parametric hypothesis test that evaluates the median of a population. It uses only the signs (positive or negative) of differences between observations and a hypothesized median, ignoring the magnitude of those differences.
2.1 Key Characteristics
- Simple: Considers only whether values are above or below a hypothesized median.
- Robust: Not affected by outliers or non-normal distributions.
- Uses: Applicable to one-sample or paired data.
- Test Statistic: Based on the binomial distribution of positive/negative signs.
For the purpose of describing Sign Test we assume that the p.d.f f(x) is a continuous density and assume that the medium denoted by \(m\) is uniquely defined by
\[ \int_{m}^{\infty} f(x) dx=\frac{1}{2} \]
Let \(X_1 , X_2 , ... ,X_n\) be a random sample of \(x\) and consider testing the hypothesis
\(H_0 : m=m_0\) against \(H_1 : m>m_0\).
From the definition of the median, it follows that when Ho is true;
\[P(X \geq m_0)=P(X-m_0 \geq 0)=\frac{1}{2}\]
and therefore that \[P(X-m_0 \geq 0)=\frac{1}{2} ; i=1,2,3,...,n\]
Let \[ Z_i = \begin{cases} 1, & \text{if } X_i - m_0 \ge 0 \\ 0, & \text{if } X_i - m_0 < 0 \end{cases} \]
We note that \(Z\) has a Bernoulli distribution with \(P=\frac{1}{2}\) when \(H_0\) is true. Since the \(Z_i\) are independent , the sum \[U=\sum_{i=1}^{n} Z_i\]
Will be a binomial random variable corresponding to n independent Bernoulli trial for which $ P=$, when Ho: is true.
Under \(H_1\)
Most \(X_i\) will tend to be larger than Mo and therefore the variable will tend to exceed the value to be expected when Ho is true. As a result, the right tail of the binomial distribution should be chosen as the critical region of the test. When \(H_0\) is true;
Let
\[ Z_i = \begin{cases} 1, & \text{with probability } \frac{1}{2} \\ 0, & \text{with probability } \frac{1}{2} \end{cases} \]
\[ E(Z_i)=(0*\frac{1}{2})+(1*\frac{1}{2})=(\frac{1}{2})\] \[ E(Z_{i}^{2})=(0^2*\frac{1}{2})+(1^2*\frac{1}{2})=(\frac{1}{2})\] \[ U=Z_1+Z_2+...+Z_n\] \[ E(U)=E(Z_1)+E(Z_2)+...+E(Z_n)=\frac{1}{2}+\frac{1}{2}+...+\frac{1}{2}=\frac{n}{2}\] \[ Var(U)=Var(Z_1)+Var(Z_2)+...+Var(Z_n)\] \[ Var(Z_i)=E(Z_1^2)-[E(Z_i^2)]^2=\frac{1}{2}-\frac{1}{4}=\frac{1}{4})\]
Therefore
\[Var(U)=\frac{1}{4}+\frac{1}{4}+...+\frac{1}{4}=\frac{n}{4}\]
that is, \(E(U)=\frac{n}{2}\) and \(Var(Z_i)=\frac{n}{4})\) when \(H_0\) is true.
For very small values of n, it is necessary to calculate the right probability until a total probability of approximately \(\alpha\) has been obtained to obtain the critical region for the test.
For \(P=\frac{1}{2}\) the binomial distribution is approximated well by its normal approximation for fairly small values of \(n\). Therefore, it usually suffices to use the normal approximation on these problems. We use the standard normal variable.
For a right tail critical region, we have;
\[Z=\frac{U-\frac{1}{2}-\frac{n}{2}}{\sqrt{\frac{n}{4}}}\]
For a left tail critical region, we have; \[Z=\frac{U+\frac{1}{2}-\frac{n}{2}}{\sqrt{\frac{n}{4}}}\]
2.2 Solved Example
The following data was obtained from testing the breaking strength of a ceramic tile manufactured through a new cheaper process;
| 20 | 42 | 18 | 21 | 19 | 26 | 21 | 20 | 22 | 35 | 22 | 24 | 18 | 20 | 32 |
|---|
Suppose that experience with the old process produced a median of 25, test the hypothesis at \(\alpha=0.05\). \(H_0 : m=25\) against \(H_1 : m<25\).
Solution
In testing \(H_0 : m=m_0\) against \(H_1 : m<m_0\)
We use the test statistic
\[Z=\frac{U+\frac{1}{2}-\frac{n}{2}}{\sqrt{\frac{n}{4}}}\] rejection region is the left tail. and reject for small value of \(Z\) where \[ U=Z_1+Z_2+...+Z_n\] \[X_1-m_0,X_2-m_0, ...., X_n-m_0\]
Subtracting 25 from each listed value we have;
| -5 | 17 | -7 | -4 | -6 | 1 | -4 | -5 | -3 | 10 | -3 | -1 | -7 | -5 | 7 |
|---|
Corresponding Z values are;
| 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
|---|
\[U=\sum_{i=1}^{n} Z_i =4\] \(U\) is binomial distribution with \(n=15\) and \(P=\frac{1}{2}\) when \(H_0\) is true.
\[Z=\frac{4+\frac{1}{2}-\frac{15}{2}}{\sqrt{\frac{15}{4}}}=-1.55\]
From the normal distribution table we have \(P(Z\leq -1.65)=0.05\)
The critical value \(Z_{critical}=-1.65\)
Conclusion
Since -1.55>-1.65, We fail to reject \(H_0\) and conclude that the median of the new cheaper process is less than 25 at 0.05 significance level.
Remarks
Instead of getting the values of \(X_i\) in the above example, we can record the signs of the numbers. \(X_i-25\) rather than the \(X_i\) values in which case we would get
| - | + | - | - | - | + | - | - | - | + | - | - | - | - | + |
|---|
The number of positive’s is 4. then the total number of positive sign gives the value of \(U=4\). It is for this reason that the test is called the Sign test
2.3 Solved Example
The following data in tonnes are the amounts of sulphur oxides emitted by a large industrial plant in 40 days.
| 17 | 15 | 20 | 29 | 19 | 18 | 22 | 25 | 27 | 9 |
|---|---|---|---|---|---|---|---|---|---|
| 24 | 20 | 17 | 6 | 24 | 14 | 15 | 23 | 24 | 26 |
| 19 | 23 | 28 | 19 | 16 | 22 | 24 | 17 | 20 | 13 |
| 19 | 10 | 23 | 18 | 31 | 13 | 20 | 17 | 24 | 14 |
Test the hypothesis at \(\alpha=0.01\). \(H_0 : m=21.5\) against \(H_1 : m<21.5\).
Solution
In testing \(H_0 : m=21.5\) against \(H_1 : m<21.5\)
We use the test statistic
\[Z=\frac{U+\frac{1}{2}-\frac{n}{2}}{\sqrt{\frac{n}{4}}}\] rejection region is the left tail. and reject for small value of \(Z\) where \[U=Z_1+Z_2+...+Z_n\] value of \(Z_i\) are as follows where \(i=1,2,...,n\)
| - | - | - | + | - | - | + | + | + | - |
|---|---|---|---|---|---|---|---|---|---|
| + | - | - | - | + | - | - | + | + | + |
| - | + | + | - | - | + | + | - | - | - |
| - | - | + | - | + | - | - | - | + | - |
The number of positive signs is 16; \(U=\sum Z_i=16\)
\(U\) is binomial with \(n=40\) and \(P=\frac{1}{2}\) when \(H_0\) is true.
The test statistic
\[Z=\frac{16+\frac{1}{2}-\frac{40}{2}}{\sqrt{\frac{40}{4}}}=-1.0107\]
The Table value from normal distribution table(critical value =-2.33)
Conclusion
Since -1.107>-2.33, we fail to reject \(H_0\) and conclude that the Median(m) is less than 21.5 at 0.01 significance level.
To manually compute the One-Sample Sign Test for your example, here’s how you can walk through the steps without using R:
2.4 Solved Example: One-Sample Sign Test
Test if the median exam score is greater than 75 using α = 0.05
Data: 78, 82, 65, 90, 72, 88, 79, 85, 95, 70
Hypothesized Median (H₀): 75
Significance Level (α): 0.05
Alternative Hypothesis (H₁): Median > 75 (one-tailed)
Solution
- Subtract Hypothesized Median from Each Value
| Score | Difference (Score - 75) | Sign |
|---|---|---|
| 78 | +3 | + |
| 82 | +7 | + |
| 65 | -10 | – |
| 90 | +15 | + |
| 72 | -3 | – |
| 88 | +13 | + |
| 79 | +4 | + |
| 85 | +10 | + |
| 95 | +20 | + |
| 70 | -5 | – |
- Count Signs
Positive signs (+): 7
Negative signs (–): 3
Zero differences: 0
Total non-zero differences (n): 10
- Apply Binomial Test
Under H₀, the probability of a positive sign is 0.5.
We test whether the number of positive signs (7 out of 10) is significantly greater than expected.
Use the binomial formula or a calculator to find:
p-value = P(X ≥ 7) where X ~ Binomial(n = 10, p = 0.5)
\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]
So,
\[P(X \geq 7) = \sum_{k=7}^{10} \binom{10}{k}(0.5)^{10}=0.1719\]
- Decision
Since p-value = 0.1719 > α = 0.05,
Fail to reject H₀: There is insufficient evidence to conclude that the median is greater than 75.
2.5 Solved Example: One-Sample Sign Test
Test if the median exam score is greater than 75 using α = 0.05
Data: 78, 82, 74, 79, 81, 76, 83, 77, 80, 72
Step-by-Step Analysis
# Data
scores <- c(78, 82, 74, 79, 81, 76, 83, 77, 80, 72)
hyp_median <- 75
# Step 2: Calculate signs
signs <- scores - hyp_median
signs_nonzero <- signs[signs != 0]
positive_signs <- sum(signs_nonzero > 0)
negative_signs <- sum(signs_nonzero < 0)
n <- length(signs_nonzero)
# Step 3: Binomial test (one-tailed)
binom_result <- binom.test(positive_signs, n, p = 0.5, alternative = "greater")
# Output results
cat("Positive signs:", positive_signs, "\n")
## Positive signs: 8
cat("Negative signs:", negative_signs, "\n")
## Negative signs: 2
cat("Total non-zero differences:", n, "\n")
## Total non-zero differences: 10
cat("p-value:", binom_result$p.value, "\n")
## p-value: 0.0546875
# Step 5: Decision
if (binom_result$p.value < 0.05) {
cat("Reject H₀: There is sufficient evidence that the median > 75.\n")
} else {
cat("Fail to reject H₀: Insufficient evidence that the median > 75.\n")
}
## Fail to reject H₀: Insufficient evidence that the median > 75.
2.6 Exercise & Assignment
- Test if median reaction time is less than 300ms (α = 0.05)
Data: 280, 295, 310, 275, 290, 285, 305, 298, 282, 315
- A quality control manager wants to test if the median weight of packaged products equals 500g. She randomly selects 15 packages:
Weights (g): 495, 502, 498, 505, 496, 503, 499, 501, 497, 504, 500, 506, 494, 507, 493
Required:
Test using both Sign Test and Signed-Rank Test
Apply normal approximation for the Signed-Rank Test
Calculate 95% confidence interval for the median using both methods
Compare the power of both tests
Write a comprehensive statistical report with recommendations
3 The Wilcoxon Signed-Rank Test
As we have seen, the signed test is very easy to perform but since we utilize only the signs of the difference between the observations and Mo in the one sample case, it tends to be wasteful of information, an alternative- non parametric test, the Wilcoxon signed-rank test is less wasteful in that it takes into account the magnitude of the differences. In this test, we rank the differences without regard to their signs; assigning rank 1 to the smallest difference in magnitude in absolute value; rank 2 to the second smallest difference in absolute value…….. and rank n to the largest difference in absolute value. Zero differences are discarded. If the absolute value of two or more differences are the same, we assign each one the mean of the ranks which they jointly occupy. The signed-rank test is based on;
\(T^+\) is the sum of the ranks assigned to the positive differences.
\(T^-\) is the sum of the ranks assigned to the negative differences.
\(T^+\) & \(T^-\) or \(T\)= min \((T^+, T^-)\)
We note that \(T^-\) and \(T^+\) takes value in the interval 0 to \(\frac{n(n+1)}{2}\)
\(T^-\) and \(T^+\) are symmetric about \(\frac{n(n+1)}{4}\)
The critical region for various test is given below.
| Alternative Hypothesis | Critical/Rejection Region |
|---|---|
| \(m \pm m_0\) | T ≤ Tα |
| \(m > m_0\) | T⁻ ≤ T₂α |
| \(m < m_0\) | T⁺ ≤ T₂α |
The Wilcoxon Signed-Rank Test is a non-parametric statistical test used to compare two related samples (paired data) or to test whether the median of a single sample differs from a specified value. It considers both the direction and magnitude of differences, making it more powerful than the Sign Test.
3.1 Key Features:
More powerful than Sign Test (uses rank information)
Uses both direction and magnitude of differences
Tests median differences for paired data or single sample
Assumes symmetric distribution of differences
3.2 Solved Example
The following are 15 measurement of the octane rating of a certain kind of gasoline.
| 97.5 | 95.2 | 97.3 | 96.0 | 98.2 |
|---|---|---|---|---|
| 93.2 | 99.1 | 96.1 | 97.0 | 100.3 |
| 97.4 | 94.5 | 95.3 | 96.8 | 98.5 |
Required
Use the signed-rank test at 0.05 level of significance to test whether or not the mean octane rating of the given and gasoline is 98.5.
Solution
In testing \(H_0 : \mu=98.5\) against \(H_1 : \mu<98.5\)
We reject \(H_0\) if \(T\leq T_\alpha =T_0.05\) for the appropriate value of n.
We subtract 98.5 from every value and rank the absolute difference
| Measurement | Differences | Absolute Differences | Rank |
|---|---|---|---|
| 97.5 | -1.0 | 1.0 | 3 |
| 95.2 | -3.3 | 3.3 | 12 |
| 97.3 | -1.2 | 1.2 | 5 |
| 96.0 | -2.5 | 2.5 | 10 |
| 98.2 | -0.3 | 0.3 | 1 |
| 93.2 | -5.3 | 5.3 | 14 |
| 99.1 | 0.6 | 0.6 | 2 |
| 96.1 | -2.4 | 2.4 | 9 |
| 97.0 | -1.5 | 1.5 | 6 |
| 100.3 | 1.8 | 1.8 | 8 |
| 97.4 | -1.1 | 1.1 | 4 |
| 94.5 | -4.0 | 4.0 | 13 |
| 95.3 | -3.2 | 3.2 | 11 |
| 96.8 | -1.7 | 1.7 | 7 |
| 98.5 | 0.0 | 0.0 | - |
The value of 98.8 with zero difference is dropped.
\(T^+=1+3+4+5+6+7+9+10+11+12+13+14=95\)
\(T^-2=+8=10\)
\(T\)= min \((T^+, T^-)\)=min \((95, 10)=10\)
For n=14, we find that \(T_{0.05}=21\)
Conclusion
Since \(10<21\), we reject the \(H_0\).
3.3 Solved Example: One-Sample Signed-Rank Test
Test if the median battery life differs from 100 hours (α = 0.05)
Data: 95, 105, 98, 112, 88, 102, 108, 92, 115, 96
Solution
Step 1: State Hypotheses
H₀: Median = 100 hours
H₁: Median ≠ 100 hours (two-tailed)
Step 2: Calculate Differences from 100
| Battery Life | Difference (Xᵢ - 100) | Absolute Difference | Rank of | Diff |
|---|---|---|---|---|
| 95 | -5 | 5 | 4 | -4 |
| 105 | +5 | 5 | 4 | +4 |
| 98 | -2 | 2 | 1.5 | -1.5 |
| 112 | +12 | 12 | 9 | +9 |
| 88 | -12 | 12 | 9 | -9 |
| 102 | +2 | 2 | 1.5 | +1.5 |
| 108 | +8 | 8 | 7 | +7 |
| 92 | -8 | 8 | 7 | -7 |
| 115 | +15 | 15 | 10 | +10 |
| 96 | -4 | 4 | 3 | -3 |
Step 3: Handle Ties in Absolute Differences
|2| appears twice → average ranks (1+2)/2 = 1.5
|5| appears twice → average ranks (3+4)/2 = 3.5
|8| appears twice → average ranks (6+7)/2 = 6.5
|12| appears twice → average ranks (8+9)/2 = 8.5
Step 4: Calculate Rank Sums
W⁺ (sum of positive ranks) = 4 + 9 + 1.5 + 7 + 10 = 31.5
W⁻ (sum of negative ranks) = 4 + 1.5 + 9 + 7 + 3 = 24.5
Step 5: Test Statistic
W = min(W⁺, W⁻) = min(31.5, 24.5) = 24.5
Step 6: Critical Value (n = 10, α = 0.05, two-tailed)
From Wilcoxon table: Critical value = 8
Step 7: Decision Rule
Reject H₀ if W ≤ critical value
24.5 > 8 ⇒ Fail to reject H₀
Conclusion: No evidence that median battery life differs from 100 hours.
3.4 Example 2: Larger Sample with Normal Approximation
Test if median response time is less than 50ms (α = 0.05)
Data: 15 response times: 45, 52, 38, 55, 48, 42, 57, 39, 51, 44, 49, 41, 53, 47, 46
Solution
Step 1: State Hypotheses
H₀: Median = 50 ms
H₁: Median < 50 ms (one-tailed)
Step 2: Calculate Differences and Ranks
| Time | Diff | Absolute Value | Rank | Signed Rank |
|---|---|---|---|---|
| 45 | -5 | 5 | 9 | -9 |
| 52 | +2 | 2 | 2.5 | +2.5 |
| 38 | -12 | 12 | 15 | -15 |
| 55 | +5 | 5 | 9 | +9 |
| 48 | -2 | 2 | 2.5 | -2.5 |
| 42 | -8 | 8 | 12 | -12 |
| 57 | +7 | 7 | 11 | +11 |
| 39 | -11 | 11 | 14 | -14 |
| 51 | +1 | 1 | 1 | +1 |
| 44 | -6 | 6 | 10 | -10 |
| 49 | -1 | 1 | 1 | -1 |
| 41 | -9 | 9 | 13 | -13 |
| 53 | +3 | 3 | 4 | +4 |
| 47 | -3 | 3 | 4 | -4 |
| 46 | -4 | 4 | 6.5 | -6.5 |
Step 3: Handle Ties
|1| appears twice → rank (1+2)/2 = 1.5
|2| appears twice → rank (3+4)/2 = 3.5
|3| appears twice → rank (5+6)/2 = 5.5
|4| appears twice → rank (7+8)/2 = 7.5
|5| appears twice → rank (9+10)/2 = 9.5
Step 4: Calculate Rank Sums
W⁺ = 2.5 + 9 + 11 + 1 + 4 = 27.5
W⁻ = 9 + 15 + 2.5 + 12 + 14 + 10 + 1 + 13 + 4 + 6.5 = 87
W = W⁺ = 27.5 (testing for values less than 50)
Step 5: Normal Approximation (n = 15 > 10)
For n > 10, we can use normal approximation:
Mean: μ = n(n+1)/4 = 15×16/4 = 60
Variance: σ² = n(n+1)(2n+1)/24 = 15×16×31/24 = 310
σ = √310 = 17.61
Z = (W - μ)/σ = (27.5 - 60)/17.61 = -32.5/17.61 = -1.85
Step 6: Critical Value and Decision
One-tailed critical Z-value at α = 0.05: -1.645
-1.85 < -1.645 ⇒ Reject H₀
Conclusion: Significant evidence that median response time is less than 50ms.
Remarks
Handling Ties: Average ranks for tied absolute differences
Zero Differences: Discard from analysis (reduce effective n)
Small Samples (n ≤ 10): Use exact critical values from tables
Large Samples (n > 10): Use normal approximation
One-tailed vs Two-tailed: Choose W⁺ or W⁻ based on alternative hypothesis
The Signed-Rank Test provides robust median testing while being more powerful than the Sign Test.
3.5 Exercise & Assignments
A fitness trainer wants to test if a new workout program increases flexibility. She measures the maximum stretch (in cm) of 8 participants before and after 4 weeks of training:
| Participant | Before | After |
|---|---|---|
| 1 | 45 | 48 |
| 2 | 52 | 55 |
| 3 | 38 | 42 |
| 4 | 60 | 58 |
| 5 | 47 | 52 |
| 6 | 55 | 60 |
| 7 | 42 | 45 |
| 8 | 51 | 54 |
Require:
Perform the Sign Test manually and draw conclusion
Perform the Wilcoxon Signed-Rank Test manually and draw conclusion
Compare the results and explain why they might differ
Verify both tests using R
Which test is more appropriate here and why?
Calculate the effect size for the Signed-Rank Test
4 Paired Test with Small Sample
The Paired Signed-Rank Test (Wilcoxon Signed-Rank Test for paired samples) is a non-parametric statistical test used to compare two related samples or repeated measurements on a single sample. It tests whether the median difference between pairs is zero.
Key Features:
Non-parametric alternative to paired t-test
Uses ranks of absolute differences
Considers both direction and magnitude
Assumes symmetric distribution of differences
More powerful than Sign Test
We have the observation \((x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\)
Let \(\mu_1\) be the mean of \(X_{s}^{'}\) and \(\mu_2\) be the mean of \(Y_{s}^{'}\). We want to test \(H_0 : \mu_1= \mu_2\) against \(H_1 : \mu_1\pm \mu_2\) or \(\mu_1 > \mu_2\) or \(\mu_1 < \mu_2\).
We use the following table
| Alternative Hypothesis | Critical/Rejection Region |
|---|---|
| μ₁ ± μ₂ | T ≤ Tα |
| μ₁ > μ₂ | T⁻ ≤ T₂α |
| μ₁ < μ₂ | T⁺ ≤ T₂α |
For \(n\geq =15\) it is considered reasonable to assume that the distribution of \(T^+\) is approximately normal. To perform the signed -rank test based on this assumption. we need the following results.
THEOREM
The mean and variance of \(T^+\) are \[ E(T^+)=\frac{n(n+1)}{4}\] \[ Var(T^+)=\frac{n(n+1)(2n+1)}{24}\]
Proof
If the sample is of size n, then the rank to be assigned are 1,2,3,..4. For each rank the probability that it will be assigned to a positive or negative sign are both \(frac{1}{2}\), when \(H_0\) is true.
Let \[Z_i=\begin{Bmatrix} 1 & , If X_i> Y_i \\ 0 & , If X_i< Y_i \end{Bmatrix}\] For \(i=1,2,3,...,n\)
\[T^+=1*Z_1 + 2*Z_2 + ... +nZ_n\] \(Z_1, Z_2, ..., Z_n\) are independent random variables having Bernoulli Distribution with \(P=\frac{1}{2}\)
Since \(E(T^+)=\frac{1}{2}\) and \(Var(T^+)=\frac{1}{4}\)
\[E(T^+)=E(1*Z_1 + 2*Z_2 + ... +n*Z_n)\] \[E(T^+)=1E(Z_1) + 2E(Z_2) + ... +nE(Z_n)\] \[E(T^+)=1*\frac{1}{2}+2*\frac{1}{2}+ ...+ n*\frac{1}{2}\] \[E(T^+)=1*\frac{1+2+...+n}{2}=\frac{n(n+1)}{4}\]
\[Var(T^+)=Var(1*Z_1 + 2*Z_2 + ... +n*Z_n)\] \[Var(T^+)=1^2E(Z_1) + 2^2E(Z_2) + ... +n^2E(Z_n)\] \[Var(T^+)=1*\frac{1}{4}+4*\frac{1}{4}+ ...+ n^2*\frac{1}{4}\] \[Var(T^+)=\frac{n(n+1)(2n+1}{24}\]
4.1 Solved Example
The following are the weighs in pound before and after of 16 persons who stayed on a certain reducing diet for 4 weeks.
| Before | After |
|---|---|
| 147.0 | 137.9 |
| 183.5 | 176.2 |
| 232.1 | 219.0 |
| 161.6 | 163.8 |
| 197.5 | 193.5 |
| 206.3 | 201.4 |
| 177.0 | 180.6 |
| 215.4 | 203.2 |
| 147.7 | 149.0 |
| 208.1 | 195.4 |
| 166.8 | 158.3 |
| 131.8 | 134.4 |
| 150.3 | 149.3 |
| 197.2 | 189.1 |
| 159.8 | 159.1 |
| 171.7 | 173.2 |
Required
Use the signed rank test to test at 0.05 level of significance whether the weight reducing diet is effective.
Solution
We want to test \(H_0 : \mu_1= \mu_2\) against \(H_1 :\mu_1> \mu_2\)
We reject \(H_0\) if \(Z\geq Z_0.05=1.645\)
Where test statistics
\[Z=\frac{T^+-E(T^+)}{\sqrt{Var(T^+)}}\]
The difference between the respective pair are
| Pair | Differences | Absolute Differences | Rank |
|---|---|---|---|
| 1 | 9.1 | 9.1 | 13 |
| 2 | 7.3 | 7.3 | 10 |
| 3 | 13.1 | 13.1 | 16 |
| 4 | -2.2 | 2.2 | 5 |
| 5 | 4.0 | 4.0 | 8 |
| 6 | 4.9 | 4.9 | 9 |
| 7 | -3.6 | 3.6 | 7 |
| 8 | 12.2 | 12.2 | 14 |
| 9 | -1.3 | 1.3 | 3 |
| 10 | 12.7 | 12.7 | 15 |
| 11 | 8.3 | 8.3 | 12 |
| 12 | -2.5 | 2.5 | 6 |
| 13 | 1.0 | 1.0 | 2 |
| 14 | 8.1 | 8.1 | 11 |
| 15 | 0.7 | 0.7 | 1 |
| 16 | -1.5 | 1.5 | 4 |
\(T^+=13+10+16+8+9+14+15+12+2+11+1=111\)
\[E(T^+)=\frac{n(n+1)}{4}=\frac{16*17}{4}=68\]
\[Var(T^+)=\frac{n(n+1)(2n+1}{24}=\frac{16*17*33}{24}=374\]
\[Z=\frac{111-68}{\sqrt{374}}=2.22\]
Conclusion
Since \(2.22> 1.645\), we reject the \(H_0\) at 0.05 Significance level.
4.2 Solved Example: Paired Signed-Rank Test
Test if a diet program reduces weight (α = 0.05)
Data: 8 participants’ weights before and after
| Participant | Before | After | Difference | Absolute Value | Rank | Signed Rank |
|---|---|---|---|---|---|---|
| 1 | 185 | 175 | -10 | 10 | 7 | -7 |
| 2 | 192 | 190 | -2 | 2 | 1.5 | -1.5 |
| 3 | 210 | 195 | -15 | 15 | 8 | -8 |
| 4 | 178 | 180 | +2 | 2 | 1.5 | +1.5 |
| 5 | 165 | 160 | -5 | 5 | 3.5 | -3.5 |
| 6 | 195 | 185 | -10 | 10 | 7 | -7 |
| 7 | 200 | 195 | -5 | 5 | 3.5 | -3.5 |
| 8 | 188 | 182 | -6 | 6 | 5 | -5 |
Solution
Step 1: State Hypotheses
H₀: Median difference = 0 (no weight change)
H₁: Median difference < 0 (weight loss)
Step 2: Calculate Differences and Ranks (Shown in table above)
Step 3: Handle Ties
|2| appears twice → average ranks (1+2)/2 = 1.5
|5| appears twice → average ranks (3+4)/2 = 3.5
|10| appears twice → average ranks (6+7)/2 = 6.5
Step 4: Calculate Rank Sums
W⁺ (positive ranks) = 1.5
W⁻ (negative ranks) = 7 + 1.5 + 8 + 3.5 + 7 + 3.5 + 5 = 36.5
Step 5: Test Statistic
For one-tailed test: W = W⁺ = 1.5 (since we’re testing for decrease)
Step 6: Critical Value (n = 8, α = 0.05, one-tailed)
From Wilcoxon table: Critical value = 3
Step 7: Decision Rule
Reject H₀ if W ≤ critical value
since 1.5 ≤ 3 ⇒ Reject H₀
Conclusion: Significant evidence that the diet program reduces weight.
4.3 Solved Example:
Test if a new teaching method improves mathematics scores (α = 0.05)
Data: 8 students’ scores before and after training
| Student | Before | After | Difference | AV | Rank | Signed Rank |
|---|---|---|---|---|---|---|
| 1 | 65 | 72 | +7 | 7 | 6 | +6 |
| 2 | 58 | 65 | +7 | 7 | 6 | +6 |
| 3 | 72 | 68 | -4 | 4 | 4 | -4 |
| 4 | 80 | 85 | +5 | 5 | 5 | +5 |
| 5 | 68 | 75 | +7 | 7 | 6 | +6 |
| 6 | 75 | 80 | +5 | 5 | 5 | +5 |
| 7 | 62 | 70 | +8 | 8 | 8 | +8 |
| 8 | 70 | 74 | +4 | 4 | 4 | +4 |
Solution
Step 1: State Hypotheses
H₀: Median difference = 0 (no improvement)
H₁: Median difference > 0 (improvement) - one-tailed
Step 2: Calculate Differences and Absolute Values
(Shown in table above)
Step 3: Rank Absolute Differences
Handle ties by averaging ranks:
|4| appears twice: ranks (3+4)/2 = 3.5 → use 4 (no tie adjustment needed in this case)
|5| appears twice: ranks (5+6)/2 = 5.5 → use 5
|7| appears three times: ranks (7+8+9)/3 = 8 → use 6, 6, 6
Corrected Ranking:
| AV | Rank |
|---|---|
| 4 | 3.5 |
| 4 | 3.5 |
| 5 | 5.5 |
| 5 | 5.5 |
| 7 | 8 |
| 7 | 8 |
| 7 | 8 |
| 8 | 7 |
Step 4: Assign Signed Ranks
Student Signed Rank
| Participant | Difference |
|---|---|
| 1 | +8 |
| 2 | +8 |
| 3 | -3.5 |
| 4 | +5.5 |
| 5 | +8 |
| 6 | +5.5 |
| 7 | +7 |
| 8 | +3.5 |
Step 5: Calculate Rank Sums
W⁺ = 8 + 8 + 5.5 + 8 + 5.5 + 7 + 3.5 = 45.5
W⁻ = 3.5
W = min(W⁺, W⁻) = 3.5
Step 6: Critical Value (n=8, α=0.05, one-tailed)
From Wilcoxon table: Critical value = 3
Step 7: Decision
W = 3.5 > 3 ⇒ Fail to reject H₀
Conclusion: No significant evidence of improvement at 5% level.
4.4 Exercise & Assignment
- A fitness trainer wants to test if a new workout program increases flexibility. She measures the maximum stretch (in cm) of 8 participants before and after 4 weeks of training:
| Participant | Before | After |
|---|---|---|
| 1 | 45 | 48 |
| 2 | 52 | 55 |
| 3 | 38 | 42 |
| 4 | 60 | 58 |
| 5 | 47 | 52 |
| 6 | 55 | 60 |
| 7 | 42 | 45 |
| 8 | 51 | 54 |
Required
Perform the paired Signed-Rank Test manually and draw conclusion
Verify both tests using R
- A clinical trial tested a new drug’s effect on pain reduction. Patients rated their pain level (1-10 scale) before and after treatment:
Data:
| Patient | Before | After |
|---|---|---|
| 1 | 8 | 5 |
| 2 | 7 | 6 |
| 3 | 9 | 4 |
| 4 | 6 | 5 |
| 5 | 8 | 3 |
| 6 | 7 | 4 |
| 7 | 9 | 6 |
| 8 | 5 | 4 |
| 9 | 8 | 5 |
| 10 | 7 | 3 |
| 11 | 6 | 2 |
| 12 | 8 | 4 |
| 13 | 7 | 5 |
| 14 | 9 | 7 |
| 15 | 8 | 6 |
Required:
Perform manual paired Signed-Rank Test with normal approximation
Construct 95% confidence interval for median difference
Write comprehensive statistical report
- The accompanying table gives the scores of a group of 15 students in Mathematics and Science
| Student | Mathematics | Science |
|---|---|---|
| 1 | 22 | 33 |
| 2 | 37 | 68 |
| 3 | 36 | 42 |
| 4 | 38 | 49 |
| 5 | 42 | 51 |
| 6 | 58 | 65 |
| 7 | 58 | 65 |
| 8 | 60 | 71 |
| 9 | 62 | 55 |
| 10 | 65 | 74 |
| 11 | 66 | 68 |
| 12 | 56 | 64 |
| 13 | 66 | 67 |
| 14 | 67 | 73 |
| 15 | 62 | 65 |
Required
Use Wilcoxon’s signed-rank test to determine if the location of the distribution of scores differ significantly for the two subject at \(\alpha = 0.05\)
5 The Rank sum Test (The U test/Mann-whitney test)
The \(U\)- test is used for two sample. Other names for the \(U\)- test are; - Wilcoxin test
- Mann-Whitney test
We are going to test the hypothesis that we are sampling identical continuous population against the alternative that the two population have unequal means. As originally proposed by Wilcoxon, the test is thus based on the values of;
\(W_1\) - the sum of ranks of the first sample \(W_2\) - the sum of ranks of the second sample
It does not matter whether we use \(W_1\) or \(W_2\) for it. There are \(n_1\) values in the first sample and \(n_2\) values in the second sample
\(W_1+W_2\) - the sum of the first \(n_1+n_2\) integers, that is
\[W_1+W_2=\frac{(n_1+n_2)(n_1+n_2 +1)}{2}\]
for any pair of values of \(W_1\) and \(W_2\).
Thus the test based on \(W_1\) and \(W21\) are equivalent.
In actual practice, we seldom use test based on the statistics \(W_1\) and \(W_2\) instead. we use the related statistics
\[U_1=W_1 -\frac{(n_1)(n_1 +1)}{2}\] \[U_2=W_2 -\frac{(n_2)(n_2 +1)}{2}\]
or the Test statistics
\(U=min(U_1, U_2)\)
The resulting tests are all equivalent to the one based on \(W_1\) and \(W_2\) , but they have the advantage that they lead themselves more readily to the construction of tables of critical values.
\[U_1 + U_2 =W_1 -\frac{(n_1)(n_1 +1)}{2} +W_2 -\frac{(n_2)(n_2 +1)}{2}\] \[U_1 + U_2 =\frac{(n_1 +n_2)(n_1 +n_2 + 1)}{2} -\frac{(n_1)(n_1 +1)}{2} -\frac{(n_2)(n_2 +1)}{2}\] \[U_1 + U_2 =n_1n_2\]
that is, The sum of \(U_1 + U_2\) is always \(n_1n_2\) and both of these random variables take on the same range of values from \(0\) to \(n_1n_2\).
We use the following table for testing $ H_0 : _1= _2 $ against $ H_1 :_1_2 $ or \(\mu_1> \mu_2\) or \(\mu_1< \mu_2\)
| Alternative Hypothesis | Critical / Rejection Region |
|---|---|
| μ₁ ± μ₂ | U ≤ Uₐ |
| μ₁ > μ₂ | U₂⁻ ≤ U₂ₐ |
| μ₁ < μ₂ | U₁⁺ ≤ U₂ₐ |
where the level of significance is \(\alpha\) for each test. The critical values of \(U\) are such that \(U_\alpha\) is the larger value for which \(P(U\leq U_\alpha)\) does not exceed \(\alpha\)
The Mann-Whitney U test is a nonparametric test of the null hypothesis that two independent samples come from the same population against an alternative hypothesis, typically that two populations have different medians.
The Mann–Whitney U test (also called the Wilcoxon rank-sum test) is a non-parametric test used to compare whether two independent samples come from the same population (i.e., whether their distributions are identical).
It is often used as an alternative to the independent samples t-test when the assumptions of normality or equal variances are not satisfied
Key Points / Notes
Non-parametric alternative to independent t-test
Compares two independent groups
Uses ranks rather than raw data values
Tests stochastic superiority (whether one group tends to have higher values)
No normality assumption required
Hypotheses:
H₀: All groups come from identical populations (medians are equal)
H₁: At least one group comes from a different population
Assumptions:
Independent random samples
Ordinal or continuous data
Shape of distributions should be similar
Procedure:
Combine the two groups into one dataset.
Assign ranks to all observations (average ranks in case of ties).
Compute the sum of ranks for each group.
Use the rank sums to calculate the U statistic:
\[U₁ = n₁n₂ + n₁(n₁+1)/2 - R₁\]
\[U₂ = n₁n₂ + n₂(n₂+1)/2 - R₂\]
\[U = min(U₁, U₂)\]
where:
- \(n_1\) = size of group 1
- \(R_1\) = sum of ranks in group 1
- Similarly, compute \(U\) for group 2
- The smaller of the two \(U\) values is the test statistic
The smaller of the two U values is the test statistic.
Decision Rule
- For small samples: Compare the observed \(U\) with critical values from Mann–Whitney tables
- For large samples: Convert to a z-score and use the normal approximation
Interpretation
- A small U indicates that the distributions differ (one group tends to have systematically higher/lower values than the other)
Advantages:
Doesn’t assume normality.
Works with ordinal data or continuous data.
More robust than the t-test under non-normal distributions.
5.1 Solved Example
Two samples of the frames had the following burning times.
| Brand A | Brand B |
|---|---|
| 14.9 | 15.2 |
| 11.3 | 19.8 |
| 13.2 | 14.7 |
| 16.6 | 18.3 |
| 17.0 | 16.2 |
| 14.1 | 21.2 |
| 15.4 | 18.9 |
| 13.0 | 12.2 |
| 16.9 | 15.3 |
| – | 19.4 |
Test whether the two sample come from identical continuous population or whether the mean burning time of A is less the that of brand B flames at \(\alpha=0.05\)
Solution
We wish to test \(H_0 : \mu_1= \mu_2\) against \(H_1 :\mu_1< \mu_2\) Arrange the values in ascending order the rank the values, we have;
\(W_1\) - the sum of ranks of the first sample
\(W_2\) - the sum of ranks of the second sample
\[W_1=1+3+4+5+7+10+12+13+14=69\]
\[U_1=W_1 -\frac{(n_1)(n_1 +1)}{2} =69 -\frac{(9)(10)}{2} =24\]
For \(n_1 =9\) and \(n_2=10\) then \(U_2\alpha =U_2(0.05) =24\)
Conclusion
Since \(24\leq 24\), we reject the \(H_0\) and conclude that on the average Brand A flames has mean burning time that is less than that for Brand B.
When both \(n_1\) and \(n_2\) are greater than 8 it is considered reasonable to assume that the distribution of \(U_1\) and \(U_2\) can be approximated closely by normal distribution. To perform the given rank sum test on the basis of this assumption, we need the following results.
THEOREM
Under the null hypothesis, the means and variance of \(U_1\) and \(U_2\) are \[E(U_1)=E(U_1)=\frac{n_1 n_2}{2}\] \[Var(U_1)=Var(U_1)=\frac{n_1 n_2(n_1 + n_2 +1)}{12}\]
5.2 Solved Example
The following are the numbers of minutes it took random sample of 15 men and 12 women to complete a written test give for the renewal of their driving licences.
| Men | Women |
|---|---|
| 9.9 | 8.6 |
| 7.4 | 10.9 |
| 8.9 | 9.8 |
| 9.1 | 10.7 |
| 7.7 | 9.4 |
| 9.7 | 10.3 |
| 11.8 | 7.3 |
| 9.2 | 11.5 |
| 10.0 | 7.6 |
| 10.2 | 9.3 |
| 9.5 | 8.8 |
| 10.8 | 9.6 |
| 8.0 | – |
| 11.0 | – |
| 7.5 | – |
Required
Use the U test based on the normal approximation to test.
\(H_0 : \mu_1= \mu_2\) against \(H_1 :\mu_1 \leq \mu_2\)
where \(\mu_1\) and \(\mu_2\) are the average amount of times it takes men and women to complete the test at \(\alpha=0.01\)
Arrange the values in ascending order and the rank the values, we have;
\(W_1\) - the sum of ranks of the first sample
\(W_2\) - the sum of ranks of the second sample
\[W_1=2+3+5+6+9+10+11+14+16+18+20+23+25+27=208\] \[W_2=1+4+7+8+12+13+15+17+21+22+24+26=170\]
\[U_1=W_1 -\frac{(n_1)(n_1 +1)}{2} =208-(15)(8)=88\] \[U_2=W_2 -\frac{(n_2)(n_2 +1)}{2} =170-(6)(13)=92\]
The test statistics,
\(U= min (U_1, U_2) = min(88,92) = 88\)
\[E(U_1)=E(U_1)=\frac{n_1 n_2}{2}=\frac{15*12}{2}=90\] \[Var(U_1)=Var(U_1)=\frac{n_1 n_2(n_1 + n_2 +1)}{12}=\frac{15*12(28)}{12}=420\]
\[Z=\frac{U-E(U)}{\sqrt{Var(U)}}=\frac{88-90}{\sqrt{420}}=-0.098 \sim -0.1\]
Conclusion
Since -1.645<-0.1, we fail to reject \(H_0\) and conclude that the average amount of time it takes men and women to complete the test is the same at \(\alpha=0.01\)
5.3 Solved Example
Compare exam scores between two teaching methods (α = 0.05)
Method A: 78, 85, 92, 65, 70, 88
Method B: 72, 68, 80, 75, 82, 79, 74
Solution
Step 1: State Hypotheses
H₀: The two populations are identical
H₁: Method A tends to produce higher scores than Method B (one-tailed)
Step 2: Combine and Rank All Scores
| Score | Group | Rank |
|---|---|---|
| 65 | A | 1 |
| 68 | B | 2 |
| 70 | A | 3 |
| 72 | B | 4 |
| 74 | B | 5 |
| 75 | B | 6 |
| 78 | A | 7 |
| 79 | B | 8 |
| 80 | B | 9 |
| 82 | B | 10 |
| 85 | A | 11 |
| 88 | A | 12 |
| 92 | A | 13 |
Step 3: Calculate Rank Sums
R₁ (Method A) = 1 + 3 + 7 + 11 + 12 + 13 = 47
R₂ (Method B) = 2 + 4 + 5 + 6 + 8 + 9 + 10 = 44
Step 4: Calculate U Statistics
n₁ = 6, n₂ = 7
U₁ = n₁n₂ + n₁(n₁+1)/2 - R₁ = 6×7 + 6×7/2 - 47 = 42 + 21 - 47 = 16
U₂ = n₁n₂ + n₂(n₂+1)/2 - R₂ = 6×7 + 7×8/2 - 44 = 42 + 28 - 44 = 26
U = min(U₁, U₂) = 16
Step 5: Critical Value (n₁=6, n₂=7, α=0.05, one-tailed)
From Mann-Whitney table: Critical value = 8
Step 6: Decision
U = 16 > 8 ⇒ Fail to reject H₀
Conclusion: No evidence that Method A produces higher scores.
R Code Verification
# Example 1: Teaching Methods
method_A <- c(78, 85, 92, 65, 70, 88)
method_B <- c(72, 68, 80, 75, 82, 79, 74)
wilcox.test(method_A, method_B, alternative = "greater", exact = TRUE)
##
## Wilcoxon rank sum exact test
##
## data: method_A and method_B
## W = 26, p-value = 0.2669
## alternative hypothesis: true location shift is greater than 0
# Manual calculation function
manual_mann_whitney <- function(group1, group2) {
# Combine and rank data
combined <- c(group1, group2)
ranks <- rank(combined)
n1 <- length(group1)
n2 <- length(group2)
R1 <- sum(ranks[1:n1])
R2 <- sum(ranks[(n1+1):(n1+n2)])
U1 <- n1*n2 + n1*(n1+1)/2 - R1
U2 <- n1*n2 + n2*(n2+1)/2 - R2
cat("Group 1 ranks sum (R1):", R1, "\n")
cat("Group 2 ranks sum (R2):", R2, "\n")
cat("U1 =", U1, "U2 =", U2, "\n")
cat("Test statistic U =", min(U1, U2), "\n")
return(list(U1 = U1, U2 = U2, U = min(U1, U2), R1 = R1, R2 = R2))
}
manual_mann_whitney(method_A, method_B)
## Group 1 ranks sum (R1): 47
## Group 2 ranks sum (R2): 44
## U1 = 16 U2 = 26
## Test statistic U = 16
## $U1
## [1] 16
##
## $U2
## [1] 26
##
## $U
## [1] 16
##
## $R1
## [1] 47
##
## $R2
## [1] 44
# Example 2: Reaction Times
young <- c(280, 295, 310, 290, 285, 300)
elderly <- c(320, 335, 310, 325, 330, 315, 340)
wilcox.test(young, elderly, alternative = "less", exact = TRUE)
## Warning in wilcox.test.default(young, elderly, alternative = "less", exact =
## TRUE): cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: young and elderly
## W = 0.5, p-value = 0.002111
## alternative hypothesis: true location shift is less than 0
manual_mann_whitney(young, elderly)
## Group 1 ranks sum (R1): 21.5
## Group 2 ranks sum (R2): 69.5
## U1 = 41.5 U2 = 0.5
## Test statistic U = 0.5
## $U1
## [1] 41.5
##
## $U2
## [1] 0.5
##
## $U
## [1] 0.5
##
## $R1
## [1] 21.5
##
## $R2
## [1] 69.5
5.4 Solved Example
Compare reaction times between two age groups (α = 0.05)
Young: 280, 295, 310, 290, 285, 300
Elderly: 320, 335, 310, 325, 330, 315, 340
Solution
Step 1: State Hypotheses
H₀: The two populations are identical
H₁: Young group has faster reaction times (one-tailed)
Step 2: Combine and Rank All Scores (Handle Ties)
| Time | Group | Rank |
|---|---|---|
| 280 | Young | 1 |
| 285 | Young | 2 |
| 290 | Young | 3 |
| 295 | Young | 4 |
| 300 | Young | 5 |
| 310 | Young | 6.5 |
| 310 | Elderly | 6.5 |
| 315 | Elderly | 8 |
| 320 | Elderly | 9 |
| 325 | Elderly | 10 |
| 330 | Elderly | 11 |
| 335 | Elderly | 12 |
| 340 | Elderly | 13 |
Step 3: Calculate Rank Sums
R₁ (Young) = 1 + 2 + 3 + 4 + 5 + 6.5 = 21.5
R₂ (Elderly) = 6.5 + 8 + 9 + 10 + 11 + 12 + 13 = 69.5
Step 4: Calculate U Statistics
n₁ = 6, n₂ = 7
U₁ = n₁n₂ + n₁(n₁+1)/2 - R₁ = 6×7 + 6×7/2 - 21.5 = 42 + 21 - 21.5 = 41.5
U₂ = n₁n₂ + n₂(n₂+1)/2 - R₂ = 6×7 + 7×8/2 - 69.5 = 42 + 28 - 69.5 = 0.5
U = min(U₁, U₂) = 0.5
Step 5: Critical Value (n₁=6, n₂=7, α=0.05, one-tailed)
From Mann-Whitney table: Critical value = 8
Step 6: Decision
U = 0.5 ≤ 8 ⇒ Reject H₀
Conclusion: Strong evidence that young group has faster reaction times
5.5 Exercise & Assignment
- A pharmaceutical company tested a new drug against a placebo. Patients rated their symptom improvement (0-100 scale):
Drug Group: 75, 82, 68, 90, 78, 85, 72, 88
Placebo Group: 62, 58, 71, 65, 70, 63, 67, 60, 64
Required:
Perform manual Mann-Whitney U test (show all steps)
Test at α = 0.01 significance level
Calculate exact p-value using normal approximation
Verify with R code
Interpret results for clinical context
- Compare Employee productivity between two departments:
Department A: 45, 52, 38, 60, 47, 55, 42, 51, 49, 53
Department B: 58, 62, 55, 65, 60, 57, 63, 59, 61, 58, 56, 64
Required:
Perform manual Mann-Whitney U test with normal approximation
Calculate effect size (r = Z/√N)
Construct 95% confidence interval for median difference
Compare with independent t-test results
Write comprehensive statistical report
R Code Template for Assignments
# Assignment Question 1 Template
drug <- c(75, 82, 68, 90, 78, 85, 72, 88)
placebo <- c(62, 58, 71, 65, 70, 63, 67, 60, 64)
# Mann-Whitney U Test
wilcox.test(drug, placebo, alternative = "greater", exact = TRUE)
##
## Wilcoxon rank sum exact test
##
## data: drug and placebo
## W = 70, p-value = 0.0001645
## alternative hypothesis: true location shift is greater than 0
# Manual calculation
manual_mann_whitney(drug, placebo)
## Group 1 ranks sum (R1): 106
## Group 2 ranks sum (R2): 47
## U1 = 2 U2 = 70
## Test statistic U = 2
## $U1
## [1] 2
##
## $U2
## [1] 70
##
## $U
## [1] 2
##
## $R1
## [1] 106
##
## $R2
## [1] 47
# Assignment Question 2 Template
dept_A <- c(45, 52, 38, 60, 47, 55, 42, 51, 49, 53)
dept_B <- c(58, 62, 55, 65, 60, 57, 63, 59, 61, 58, 56, 64)
wilcox.test(dept_A, dept_B, alternative = "two.sided", exact = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: dept_A and dept_B
## W = 7, p-value = 0.0005309
## alternative hypothesis: true location shift is not equal to 0
# Normal approximation details
mw_details <- function(group1, group2) {
result <- manual_mann_whitney(group1, group2)
n1 <- length(group1)
n2 <- length(group2)
# Normal approximation
mu_U <- n1*n2/2
sigma_U <- sqrt(n1*n2*(n1+n2+1)/12)
z <- (result$U - mu_U)/sigma_U
cat("Mean U:", mu_U, "\n")
cat("SD U:", sigma_U, "\n")
cat("Z-score:", z, "\n")
# Effect size
effect_r <- abs(z)/sqrt(n1+n2)
cat("Effect size r:", effect_r, "\n")
return(list(z = z, effect_r = effect_r))
}
mw_details(dept_A, dept_B)
## Group 1 ranks sum (R1): 62
## Group 2 ranks sum (R2): 191
## U1 = 113 U2 = 7
## Test statistic U = 7
## Mean U: 60
## SD U: 15.16575
## Z-score: -3.494717
## Effect size r: 0.7450761
## $z
## [1] -3.494717
##
## $effect_r
## [1] 0.7450761
5.6 Procedure
Combine the two groups into one dataset
Assign ranks to all observations (average ranks in case of ties)
Compute the sum of ranks for each group
Calculate the U statistic:
\[ U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1 \]
where:
- \(n_1\) = size of group 1
- \(R_1\) = sum of ranks in group 1
- Similarly, compute \(U\) for group 2
- The smaller of the two \(U\) values is the test statistic
- \(n_1\) = size of group 1
5.7 Decision Rule
- For small samples: Compare the observed \(U\) with critical values from Mann–Whitney tables
- For large samples: Convert to a z-score and use the normal approximation
5.8 Interpretation
- A small U indicates that the distributions differ (one group tends to have systematically higher/lower values than the other)
5.9 Advantages
- Doesn’t assume normality
- Works with ordinal or continuous data
- More robust than the t-test under non-normal distributions
5.10 Limitations
- Less powerful than the t-test when data are truly normal
- Tests difference in distributions, not strictly medians (though often interpreted as median comparison)
6 Kruskal-Wallis H test
The H-test is a generalization of the rank sum test to the case where we test null hypothesis that k samples come from identical continuous populations. As in the U-test, the data are ranked jointly from low to high as though they constitute one sample. Then, let \(R_i\) be the sum of the ranks of the values of the \(i^th\) sample, we base the test on the statistics.
\[H=\frac{12}{n(n+1)} \sum_{i=1}^{k} n_i\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}\] Therefore, the \(H\) statistics is proportional to a weighted mean of the square difference. \[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}\]
where \(\frac{R_i}{n_i}\) is the mean for sample \(i\) and \(\frac{n+1}{2}\) is the mean rank for all the data, it follows that the null hypothesis must be rejected for large value of \(H\).
Simplifying the expression, we have
\[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}=\frac{R_{i}^{2}}{n_{i}^{2}}-\frac{2R_i}{n_i}\left(\frac{n+1}{2}\right)+\left[\frac{(n+1)}{2}\right]^2\]
\[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}=\frac{R_{i}^{2}}{n_{i}^{2}}-\frac{R_i}{n_i }(n+1)+\frac{(n+1)^2}{4}\]
\[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}=n_i\left[\frac{R_{i}}{n_{i}^{2}}-\frac{n+1}{2}\right]^2\]
\[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}=\frac{R_{i}^{2}}{n_{i}}-R_i(n+1)+n_i\frac{(n+1)^2}{4}\]
\[\sum_{i=1}^{k}n_i\left[\frac{R_{i}}{n_{i}}-\frac{n+1}{2}\right]^2=\sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}-(n+1)\sum_{i=1}^{k}R_i +\frac{(n+1)^2}{4}\sum_{i=1}^{k}n_i\]
\[\sum_{i=1}^{k}n_i\left[\frac{R_{i}}{n_{i}}-\frac{n+1}{2}\right]^2=\sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}-(n+1)\frac{n(n+1)}{2}+\frac{n(n+1)^2}{4}\]
\[\sum_{i=1}^{k}n_i\left[\frac{R_{i}}{n_{i}}-\frac{n+1}{2}\right]^2=\sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}-\frac{n(n+1)^2}{4}\]
Substituting in H Statistic, we have
\[H=\frac{12}{n(n+1)}\left[\sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}-\frac{n(n+1)^2}{4}\right] \]
\[H=\frac{12}{n(n+1)}\sum_{i=1}^{k}\left[\frac{R_{i}^{2}}{n_{i}}\right]-3(n+1)\]
For very small values of \(k\) and \(n_i\), the test of the null hypothesis may be based on special tables but since the sample distribution of \(H\) depends on the values of \(n_i\). It is impossible to tabulate it in a compact form. Hence, the test usually based on the large sample theory that the sampling distribution of \(H\) can be approximated closely with a chi-square distribution with \(k-1\) degrees of freedom.
The Kruskal-Wallis H test is a non-parametric statistical test used to determine if there are statistically significant differences between three or more independent groups. It is the non-parametric alternative to the one-way ANOVA.
The Kruskal–Wallis test is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing more than two independent groups.
Key Characteristics:
Non-parametric alternative to one-way ANOVA
Compares three or more independent groups
Uses ranks of the data rather than the raw data
Tests the null hypothesis that the medians of the groups are equal
Assumptions:
The data are independent and randomly sampled.
The dependent variable should be measured at least at the ordinal level.
The distributions of the groups should have the same shape.
Hypotheses:
H₀: The medians of all groups are equal.
H₁: At least one group has a median different from the others.
The test statistic \(H\) is calculated as:
\[ H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1) \]
Where:
- \(N\) = total number of observations across all groups
- \(k\) = number of groups
- \(R_i\) = sum of ranks for group \(i\)
- \(n_i\) = number of observations in group \(i\)
Adjustment for Ties
If there are tied ranks, the test statistic is adjusted by dividing \(H\) by:
\[ 1 - \frac{\sum (t^3 - t)}{N^3 - N} \]
Where:
- \(t\) = number of tied observations in each group of ties
- The summation is over all groups of tied ranks
Decision Rule:
Reject H₀ if H ≥ critical value from chi-square distribution with (k-1) degrees of freedom, or if p-value ≤ α.
Interpretation
- A large H suggests that at least one group differs significantly from the others.
- The test statistic follows a chi-squared distribution with \(k - 1\) degrees of freedom.
Notes
- The Kruskal–Wallis test is an extension of the Mann–Whitney U test for more than two groups.
- It does not assume normality and is suitable for ordinal or non-normally distributed interval data.
6.1 Solved Example
The following are the inches per gallon which a test driver got for ten tank of each of the three kinds of gasoline.
| Gasoline A | Gasoline B | Gasoline C |
|---|---|---|
| 20 | 29 | 19 |
| 31 | 18 | 31 |
| 24 | 29 | 16 |
| 33 | 19 | 26 |
| 23 | 20 | 31 |
| 24 | 21 | 33 |
| 28 | 34 | 28 |
| 16 | 33 | 28 |
| 19 | 30 | 25 |
| 26 | 23 | 30 |
Required
Use the Kruskal Wallis test to test whether or not there is a difference in the actual average mileage yield of the three kinds of gasoline at \(\alpha=0.05\)
Solution
Arrange the values in ascending order and rank the values jointly
| Values | Ranking | Ranking |
|---|---|---|
| 16 | 1 | 1.5 |
| 16 | 2 | 1.5 |
| 18 | 3 | 3 |
| … | … | … |
| 33 | 29 | 28 |
| 34 | 30 | 30 |
\(R_{1}=136.0\) is the sum of ranks for sample \(A\)
\(R_{2}=156.5\) is the sum of ranks for sample \(B\)
\(R_{3}=172.5\) is the sum of ranks for sample \(C\)
\[H=\frac{12}{n(n+1)}\sum_{i=1}^{k}\left[\frac{R_{i}^{2}}{n_{i}}\right]-3(n+1)=\frac{12}{30(31)}\left[\frac{136^{2}}{10}+\frac{156.5^{2}}{10}+\frac{172.5^{2}}{10}\right]-3(31)=0.86\]
The critical chi-square value \(\chi_{2, 0.05}^2=5.991\)
Conclusion
Since \(0.86<5.991\), we fail to reject the Ho and conclude that there is no difference in the actual average mileage yield of the three king of gasoline at \(\alpha=0.05\)
6.2 Solved Exmaple
Compare test scores across three teaching methods (α = 0.05)
Method A: 78, 85, 92, 65, 70
Method B: 72, 68, 80, 75, 82
Method C: 90, 88, 95, 85, 80
Solution
Step 1: State Hypotheses
H₀: The medians of the three groups are equal
H₁: At least one group has a different median
Step 2: Combine and Rank All Scores
| Score | Group | Rank |
|---|---|---|
| 65 | A | 1 |
| 68 | B | 2 |
| 70 | A | 3 |
| 72 | B | 4 |
| 75 | B | 5 |
| 78 | A | 6 |
| 80 | B | 7.5 |
| 80 | C | 7.5 |
| 82 | B | 9 |
| 85 | A | 10.5 |
| 85 | C | 10.5 |
| 88 | C | 12 |
| 90 | C | 13 |
| 92 | A | 14 |
| 95 | C | 15 |
Step 3: Calculate Rank Sums for Each Group
\[R₁ (Method A) = 1 + 3 + 6 + 10.5 + 14 = 34.5\]
\[R₂ (Method B) = 2 + 4 + 5 + 7.5 + 9 = 27.5\]
\[R₃ (Method C) = 7.5 + 10.5 + 12 + 13 + 15 = 58\]
Step 4: Calculate H Statistic
- Total observations: \(N = 15\)
- Number of groups: \(k = 3\)
- Group sizes: \(n_1 = n_2 = n_3 = 5\)
- Sum of ranks:
- \(R_1 = 34.5\)
- \(R_2 = 27.5\)
- \(R_3 = 58\)
\[ H = \frac{12}{N(N+1)} \left( \frac{R_1^2}{n_1} + \frac{R_2^2}{n_2} + \frac{R_3^2}{n_3} \right) - 3(N+1) \]
\[ H = \frac{12}{240} (238.05 + 151.25 + 672.8) - 48 = 5.105 \]
Step 5: Adjust for Ties
We have two ties:
80 appears twice (ranks 7 and 8 → average 7.5)
85 appears twice (ranks 10 and 11 → average 10.5)
- For value 80: \(t = 2\)
- For value 85: \(t = 2\)
\[ \sum (t^3 - t) = (2^3 - 2) + (2^3 - 2) = (8 - 2) + (8 - 2) = 6 + 6 = 12 \]
Total observations: \(N = 15\)
\[ \text{Correction Factor} = 1 - \frac{12}{N^3 - N} \sum (t^3 - t) \]
\[ = 1 - \frac{12 \times 12}{3375 - 15} = 1 - \frac{144}{3360} = 1 - 0.04286 = 0.95714 \]
Note: Your earlier calculation used \(\frac{12}{3360} = 0.00357\), which seems to have missed multiplying by the tie sum (12). The correct adjustment uses \(\frac{144}{3360}\).
Adjusted H Statistic
Original \(H = 5.105\)
\[ H_{\text{adjusted}} = \frac{5.105}{0.95714} \approx 5.334 \]
Step 6: Critical Value and Decision
Degrees of freedom = k-1 = 2
Chi-square critical value at α=0.05 is 5.991
Since 5.122 < 5.991, we fail to reject H₀
Conclusion: No significant difference in test scores among the three teaching methods.
6.3 Solved Exmaple
Compare reaction times across four age groups (α = 0.05)
Group 1 (20-30): 280, 295, 310, 290, 285
Group 2 (31-40): 320, 335, 310, 325, 330, 315
Group 3 (41-50): 340, 355, 350, 345, 360
Group 4 (51-60): 370, 385, 380, 375, 390, 395, 400
Solution
Step 1: State Hypotheses
H₀: The medians of the four groups are equal
H₁: At least one group has a different median
Step 2: Combine and Rank All Reaction Times
We have 5+6+5+7 = 23 observations.
Due to the large dataset, we’ll summarize the ranks:
Group 1: 280 (1), 285 (2), 290 (3), 295 (4), 310 (5.5) [tie with Group 2’s 310]
Group 2: 310 (5.5), 315 (7), 320 (8), 325 (9), 330 (10), 335 (11)
Group 3: 340 (12), 345 (13), 350 (14), 355 (15), 360 (16)
Group 4: 370 (17), 375 (18), 380 (19), 385 (20), 390 (21), 395 (22), 400 (23)
Step 3: Calculate Rank Sums for Each Group
\[R₁ = 1+2+3+4+5.5 = 15.5\]
\[R₂ = 5.5+7+8+9+10+11 = 50.5\]
\[R₃ = 12+13+14+15+16 = 70\]
\[R₄ = 17+18+19+20+21+22+23 = 140\]
Step 4: Calculate H Statistic
- Total observations: \(N = 23\)
- Group sizes:
- \(n_1 = 5\)
- \(n_2 = 6\)
- \(n_3 = 5\)
- \(n_4 = 7\)
- Sum of ranks:
- \(R_1 = 15.5\)
- \(R_2 = 50.5\)
- \(R_3 = 70\)
- \(R_4 = 140\)
\[ H = \frac{12}{N(N+1)} \left( \frac{R_1^2}{n_1} + \frac{R_2^2}{n_2} + \frac{R_3^2}{n_3} + \frac{R_4^2}{n_4} \right) - 3(N+1) \]
\[ H = \frac{12}{552} (48.05 + 425.0417 + 980 + 2800) - 72=20.5 \]
One tie: value 310 appears twice.
\[ \sum (t^3 - t) = 2^3 - 2 = 8 - 2 = 6 \]
Correction factor:
\[ 1 - \frac{6}{N^3 - N} = 1 - \frac{6}{12167 - 23} = 1 - \frac{6}{12144} = 1 - 0.000494 = 0.999506 \]
Adjusted H Statistic
\[ H_{\text{adjusted}} = \frac{20.5}{0.999506} \approx 20.51 \]
Step 6: Critical Value and Decision
Degrees of freedom = 4-1 = 3
Chi-square critical value at α=0.05 is 7.815
Since 20.51 > 7.815, we reject H₀
Conclusion: There is a significant difference in reaction times among the age groups.
R Code Verification
# Example 1: Teaching Methods
method_A <- c(78, 85, 92, 65, 70)
method_B <- c(72, 68, 80, 75, 82)
method_C <- c(90, 88, 95, 85, 80)
# Combine into a list
scores <- list(method_A, method_B, method_C)
# Kruskal-Wallis Test
kruskal.test(scores)
##
## Kruskal-Wallis rank sum test
##
## data: scores
## Kruskal-Wallis chi-squared = 5.1233, df = 2, p-value = 0.07718
# Example 2: Reaction Times
group1 <- c(280, 295, 310, 290, 285)
group2 <- c(320, 335, 310, 325, 330, 315)
group3 <- c(340, 355, 350, 345, 360)
group4 <- c(370, 385, 380, 375, 390, 395, 400)
times <- list(group1, group2, group3, group4)
kruskal.test(times)
##
## Kruskal-Wallis rank sum test
##
## data: times
## Kruskal-Wallis chi-squared = 20.469, df = 3, p-value = 0.0001357
6.4 Exercise & Assignments
- A company wants to compare employee satisfaction across four departments. Satisfaction scores (0-100) are collected:
HR: 75, 82, 68, 90, 78, 85
IT: 72, 68, 80, 75, 82, 79, 74
Finance: 90, 88, 95, 85, 80, 92
Marketing: 65, 70, 72, 68, 75, 80, 78
Required
Perform manual Kruskal-Wallis H test (show all steps)
Test at α = 0.05 significance level
Adjust for ties if necessary
Verify with R code
Interpret results for management
- Compare plant growth (in cm) under four different fertilizers:
Fertilizer A: 15, 18, 22, 17, 20, 25
Fertilizer B: 28, 25, 30, 27, 32, 29, 31
Fertilizer C: 20, 23, 19, 21, 24, 22, 26, 18
Fertilizer D: 35, 40, 38, 42, 36, 39, 41
Required
Perform manual Kruskal-Wallis H test with normal approximation
Calculate effect size (epsilon-squared)
Conduct post-hoc pairwise comparisons using Mann-Whitney U tests with Bonferroni correction
Write comprehensive statistical report
# Assignment Question 1 Template
HR <- c(75, 82, 68, 90, 78, 85)
IT <- c(72, 68, 80, 75, 82, 79, 74)
Finance <- c(90, 88, 95, 85, 80, 92)
Marketing <- c(65, 70, 72, 68, 75, 80, 78)
satisfaction <- list(HR, IT, Finance, Marketing)
kruskal.test(satisfaction)
##
## Kruskal-Wallis rank sum test
##
## data: satisfaction
## Kruskal-Wallis chi-squared = 12.759, df = 3, p-value = 0.005189
# Assignment Question 2 Template
fert_A <- c(15, 18, 22, 17, 20, 25)
fert_B <- c(28, 25, 30, 27, 32, 29, 31)
fert_C <- c(20, 23, 19, 21, 24, 22, 26, 18)
fert_D <- c(35, 40, 38, 42, 36, 39, 41)
growth <- list(fert_A, fert_B, fert_C, fert_D)
kruskal.test(growth)
##
## Kruskal-Wallis rank sum test
##
## data: growth
## Kruskal-Wallis chi-squared = 22.778, df = 3, p-value = 4.493e-05
# Post-hoc tests with Bonferroni correction
pairwise.wilcox.test(unlist(growth), rep(c("A","B","C","D"), times=sapply(growth, length)),
p.adjust.method = "bonferroni")
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: unlist(growth) and rep(c("A", "B", "C", "D"), times = sapply(growth, length))
##
## A B C
## B 0.0253 - -
## C 1.0000 0.0037 -
## D 0.0070 0.0035 0.0019
##
## P value adjustment method: bonferroni
# Effect size
kruskal_result <- kruskal.test(growth)
N <- sum(sapply(growth, length))
H <- kruskal_result$statistic
epsilon_squared <- (H - (length(growth)-1)) / (N - 1)
epsilon_squared
## Kruskal-Wallis chi-squared
## 0.7325068
7 The Run test
A run is a succession of identical letters( of other kind of symbols) which is proceeded and followed by different letters of no letter at all. Consider the following arrangement of defective d, and non defective n pieces produced in the given order by a certain machine.
nnnnn dddd nnnnnnnnnn dd nn dddd n dd nn
In this case there is a first run of \(5 n_{s}^{'}\) , then a run of \(4 d_{s}^{'}\), then a run of \(10n_{s}^{'}\) … and finally a run of \(2 n_{s}^{'}\). In all there are 9 runs of varying length. The total number of runs appearing in an arrangement of this kind is often a good indication of a possible lack of randomness. If there are too few runs, we might suspect a definite grouping or clustering or perhaps a trend, if there are two many runs we might suspect some sort of repeated alternations pattern. In our illustration, there seems to be some clustering. If we have \(n_1\) letters of one kind and \(n_2\) letters of another kind. There are;
\[ \binom{n_1 +n_2}{n_1} \]
possible arrangement of these letter. All arrangement are regarding to be equally likely.
The Run Test is a non-parametric statistical test used to determine if a sequence of data values occurs in a random order. It analyzes the pattern of “runs” - sequences of consecutive observations that are either above or below a specified value (usually the median).
Key Concepts:
Run: A sequence of consecutive observations all above or all below the cutoff value
Purpose: Tests the randomness of data sequence
Non-parametric: No assumptions about distribution shape
Applications: Quality control, time series analysis, random number generation testing
Types of Run Tests:
Above/Below Median Test: Uses median as cutoff
Above/Below Mean Test: Uses mean as cutoff
Up/Down Run Test: Tests sequences of increases/decreases
Hypotheses:
H₀: The sequence is random
H₁: The sequence is not random
Test Statistic:
R = number of runs
For large samples: Normal approximation with:
\[Mean: μ_R =\frac{(2n₁n₂)}{(n₁+n₂)} + 1\]
\[Variance: σ²_R = \frac{[2n₁n₂(2n₁n₂ - n₁ - n₂)}{[(n₁+n₂)²(n₁+n₂-1)}]\] ## Solved Example
If we have \(2 n_{s}^{'}\) and \(2 n_{s}^{'}\), then the number of possible arrangements of these letters is
The combination equation is given by:
\[ \binom{2 + 2}{2} = \binom{4}{2} = 6 \]
These arrangements are
aabb abab abba baba bbaa baab
Has 2R 4R 3R 4R 2R 3R
Let there be \(n_1\) letters of one kind and \(n_2\) letters of anorher kind.
Let \(U\) be the number of runs formed by these \(n_1 + n_2\) letters.
If \(U=2k\) , where k is a positive integer, then we have \(k\) runs of the \(n_1\) letters and \(k\) runs of \(n_2\) letters.
The number of ways in which \(n_1\) letters can form \(k\) runs is
\[ \binom{n_1 -1}{k-1} \]
The number of ways in which \(n_2\) letters can form \(k\) runs is
\[ \binom{n_2 -1}{k-1} \]
The probability mass function for the number of runs \(U = 2k\) is given by:
\[ P(U = 2k) = f(u) = \frac{ 2 \binom{n_1 - 1}{k - 1} \binom{n_2 - 1}{k - 1} }{ \binom{n_1 + n_2}{n_1} } \]
Probabilities of \(U\) run’s
If \(U = 2k + 1\), then
\[ f(u) = P[U_1 = k + 1, U_2 = k] + P[U_1 = k, U_2 = k + 1] \]
Hence,
\[ f(u) = \frac{ \binom{n_1 - 1}{k} \binom{n_2 - 1}{k - 1} }{ \binom{n_1 + n_2}{n_1} } + \frac{ \binom{n_1 - 1}{k - 1} \binom{n_2 - 1}{k} }{ \binom{n_1 + n_2}{n_1} } \]
\[ f(u) = \frac{ \binom{n_1 - 1}{k} \binom{n_2 - 1}{k - 1} + \binom{n_1 - 1}{k - 1} \binom{n_2 - 1}{k} }{ \binom{n_1 + n_2}{n_1} } \]
When \(n_1\) and \(n_2\) are small the test of randomness based on \(U\) (the number of runs) are usually performed with the use of special tables such that Table XI. We reject the null hypothesis of randomness at the level of significance \(alpha\) if
\[ U \leq U_{\alpha / 2}' \quad \text{or} \quad U \geq U_{1 - \alpha / 2}' \]
where \(U{\alpha/{2}}\) is the largest value for which \(P(U\leq U{\alpha/{2}}{'}\) does not exceed \(\alpha/{2}\) and \(U{\alpha/{2}}\) is the smallest value for which \(P(U\geq U{\alpha/{2}}{'}\) does not exceed \(\alpha/{2}\).
7.1 Solved Example
Checking on LCM trees that were planted many years age a long a country road, a country official obtained the following arrangement of healthy, \(H\) and diseased \(D\) trees
HHHHDDDHHHHHHHHDDHDDDD
Test whether this arrangement may be regarded as random at \(\alpha=0.05\)
Solution
\(H_0: Arrangement is random\) vs \(H_1: Arrangement is not random\)
From \(H_0\) if \(U\leq 6\) or \(U\geq 17\) From the table, we have \(U\)= Run=6.
Conclusion We reject the \(H_0\) and conclude that the arrangement of healthy and diseased trees is not random.
7.2 Solved Example
To test whether a radio signal contains a message or it constitute random noise, an interval of time is subdivided into a number of vary short interval and for each of there it is determined whether the signal strength exceed, E or does not exceed, N a certain level of background noise. Using the normal approximation test whether the following arrangement thus, obtained may be regarded as random and hence that the signal does not contain a message.
NNNENENENEEENEEENEENENEENEENNENEEENENNNENNENNNNE
Solution
\(H_0\): Arrangement is random vs \(H_1\): Arrangement is not random
if \(U\) is the number of runs , then
The expected value and variance of \(U\) are given by:
\[ E(U) = \frac{2n_1 n_2}{n_1 + n_2} + 1 \]
Substituting \(n_1 = n_2 = 24\):
\[ E(U) = \frac{2 \times 24 \times 24}{24 + 24} + 1 = 25 \]
The variance of \(U\) is:
\[ Var(U) = \frac{2n_1 n_2 (2n_1 n_2 - n_1 - n_2)}{(n_1 + n_2)^2 (n_1 + n_2 - 1)} \]
Substituting the values:
\[ Var(U) = \frac{2 \times 24 \times 24 (2 \times 24 \times 24 - 24 - 24)}{(24 + 24)^2 (24 + 24 - 1)} = 11.74 \]
Hence, the standardized test statistic is:
\[ Z = \frac{U \pm 0.5 - E(U)}{\sqrt{Var(U)}} \]
Substituting \(U = 30\):
\[ Z = \frac{30 - 0.5 - 25}{\sqrt{11.74}} = 1.312 \]
Note
If \(U<E(U)\) add 0.5 and \(U>E(U)\) subtract 0.5
The critical value are \(\pm 2.575\)
Conclusion
Since \(-2.575<1.312<2.575\), we fail to reject \(H_0\) and the signal does not contain any message at \(\alpha=0.01\)
7.3 Solved Example
Test if the following sequence of 20 measurements is random (α = 0.05):
Data: 45, 52, 38, 60, 47, 55, 42, 51, 49, 53, 58, 62, 55, 65, 60, 57, 63, 59, 61, 58
Solution
Step 1: Calculate Median
Sorted data: 38, 42, 45, 47, 49, 51, 52, 53, 55, 55, 57, 58, 58, 59, 60, 60, 61, 62, 63, 65
Median = (55 + 57)/2 = 56
Step 2: Convert to A/B Sequence
A = Above median, B = Below median
45(B), 52(B), 38(B), 60(A), 47(B), 55(A), 42(B), 51(B), 49(B), 53(B), 58(A), 62(A), 55(A), 65(A), 60(A), 57(A), 63(A), 59(A), 61(A), 58(A)
Step 3: Identify Runs
Sequence: B, B, B, A, B, A, B, B, B, B, A, A, A, A, A, A, A, A, A, A
Runs: BBB | A | B | A | BBBB | AAAAAAAAAA
Number of runs (R) = 6
Step 4: Count n₁ and n₂
n₁ (A’s) = 11
n₂ (B’s) = 9
Total n = 20
Step 5: Calculate Mean and Variance
\[μ_R = \frac{(2n₁n₂)}{(n₁+n₂)} + 1 = \frac{(2×11×9)}{(11+9)} + 1 = 10.9\]
\[σ²_R =\frac{[2n₁n₂(2n₁n₂ - n₁ - n₂)}{[(n₁+n₂)²(n₁+n₂-1)}]=4.637\]
\[σ_R = √4.637 = 2.153\]
Step 6: Calculate Test Statistic
\[Z = (R - μ_R)/σ_R = (6 - 10.9)/2.153 = -4.9/2.153 = -2.276\]
Step 7: Critical Value and Decision
Two-tailed test, α = 0.05
Critical Z-value = ±1.96
-2.276 < -1.96 ⇒ Reject H₀
Conclusion: Sequence is not random.
7.4 Solved Example
Test if the following stock price sequence shows random fluctuations (α = 0.05):
Prices: 100, 102, 98, 105, 103, 107, 110, 108, 112, 115, 113, 118, 120, 119, 122
Solution
Step 1: Calculate Differences
| Price | Difference | Sign |
|---|---|---|
| 100 | – | – |
| 102 | +2 | + |
| 98 | -4 | – |
| 105 | +7 | + |
| 103 | -2 | – |
| 107 | +4 | + |
| 110 | +3 | + |
| 108 | -2 | – |
| 112 | +4 | + |
| 115 | +3 | + |
| 113 | -2 | – |
| 118 | +5 | + |
| 120 | +2 | + |
| 119 | -1 | – |
| 122 | +3 | + |
Step 2: Identify Runs in Sign Sequence
Sequence: +, -, +, -, +, +, -, +, +, -, +, +, -, +
Runs: + | - | + | - | ++ | - | ++ | - | ++ | - | +
Number of runs (R) = 11
Step 3: Calculate Parameters
For up/down run test with n = 14 observations:
\[μ_R = (2n - 1)/3 = (2×14 - 1)/3 = (28 - 1)/3 = 27/3 = 9\]
\[σ²_R = (16n - 29)/90 = (16×14 - 29)/90 = (224 - 29)/90 = 195/90 = 2.167\] \[σ_R = √2.167 = 1.472\]
Step 4: Calculate Test Statistic
\[Z = (R - μ_R)/σ_R = (11 - 9)/1.472 = 2/1.472 = 1.359\]
Step 5: Critical Value and Decision
Two-tailed test, α = 0.05
Critical Z-value = ±1.96
1.359 < 1.96 ⇒ Fail to reject H₀
Conclusion: Stock price fluctuations appear random.
R Code Verification
# Example 1: Above/Below Median Test
data1 <- c(45, 52, 38, 60, 47, 55, 42, 51, 49, 53, 58, 62, 55, 65, 60, 57, 63, 59, 61, 58)
# Using randtests package
if(require(randtests)) {
runs.test(data1, threshold = median(data1))
}
## Loading required package: randtests
##
## Runs Test
##
## data: data1
## statistic = -2.2973, runs = 6, n1 = 10, n2 = 10, n = 20, p-value =
## 0.0216
## alternative hypothesis: nonrandomness
# Manual calculation function
manual_runs_test <- function(data, cutoff = median(data)) {
# Create sequence of A/B
sequence <- ifelse(data > cutoff, "A", "B")
cat("Sequence:", sequence, "\n")
# Count runs
runs <- 1
for(i in 2:length(sequence)) {
if(sequence[i] != sequence[i-1]) {
runs <- runs + 1
}
}
cat("Number of runs (R):", runs, "\n")
# Count n1 and n2
n1 <- sum(sequence == "A")
n2 <- sum(sequence == "B")
cat("n1 (A's):", n1, "n2 (B's):", n2, "\n")
# Calculate mean and variance
mu_R <- (2*n1*n2)/(n1+n2) + 1
var_R <- (2*n1*n2*(2*n1*n2 - n1 - n2))/((n1+n2)^2*(n1+n2-1))
sigma_R <- sqrt(var_R)
cat("Mean (μ_R):", mu_R, "\n")
cat("Variance (σ²_R):", var_R, "\n")
cat("SD (σ_R):", sigma_R, "\n")
# Calculate Z-statistic
Z <- (runs - mu_R)/sigma_R
cat("Z-statistic:", Z, "\n")
return(list(runs = runs, Z = Z, p_value = 2*pnorm(-abs(Z))))
}
manual_runs_test(data1)
## Sequence: B B B A B B B B B B A A B A A A A A A A
## Number of runs (R): 6
## n1 (A's): 10 n2 (B's): 10
## Mean (μ_R): 11
## Variance (σ²_R): 4.736842
## SD (σ_R): 2.176429
## Z-statistic: -2.297341
## $runs
## [1] 6
##
## $Z
## [1] -2.297341
##
## $p_value
## [1] 0.0215993
# Example 2: Up/Down Run Test
prices <- c(100, 102, 98, 105, 103, 107, 110, 108, 112, 115, 113, 118, 120, 119, 122)
# Using randtests package
if(require(randtests)) {
runs.test(prices, alternative = "two.sided", plot = TRUE)
}
##
## Runs Test
##
## data: prices
## statistic = -3.3381, runs = 2, n1 = 7, n2 = 7, n = 14, p-value =
## 0.0008436
## alternative hypothesis: nonrandomness
# Manual up/down test
manual_up_down_test <- function(data) {
# Calculate differences and signs
differences <- diff(data)
signs <- sign(differences)
cat("Differences:", differences, "\n")
cat("Signs:", signs, "\n")
# Count runs
runs <- 1
for(i in 2:length(signs)) {
if(signs[i] != signs[i-1]) {
runs <- runs + 1
}
}
cat("Number of runs (R):", runs, "\n")
n <- length(signs)
mu_R <- (2*n - 1)/3
var_R <- (16*n - 29)/90
sigma_R <- sqrt(var_R)
cat("n:", n, "\n")
cat("Mean (μ_R):", mu_R, "\n")
cat("Variance (σ²_R):", var_R, "\n")
cat("SD (σ_R):", sigma_R, "\n")
Z <- (runs - mu_R)/sigma_R
cat("Z-statistic:", Z, "\n")
return(list(runs = runs, Z = Z, p_value = 2*pnorm(-abs(Z))))
}
manual_up_down_test(prices)
## Differences: 2 -4 7 -2 4 3 -2 4 3 -2 5 2 -1 3
## Signs: 1 -1 1 -1 1 1 -1 1 1 -1 1 1 -1 1
## Number of runs (R): 11
## n: 14
## Mean (μ_R): 9
## Variance (σ²_R): 2.166667
## SD (σ_R): 1.47196
## Z-statistic: 1.358732
## $runs
## [1] 11
##
## $Z
## [1] 1.358732
##
## $p_value
## [1] 0.1742314
7.5 Exercise & Assignment
- A sample of 36 tools produced by a machine shows the following sequence of good (G) and bad (B) tools:
GGGBGGGGGBBBGGGGGGBBGGGGGGGGGBBGGGGG
Test the machine produced good (G) and bad (B) tools in random order at 5% level of significance
- The following are the number of student absent from school on so consecutive school days
49 46 57 37 45 57 50 65 34 50 58 46 47 42 53 67 40 52 49 66 53 48 68 40 53 49 61 54 48 38
Test for randomness at 5% significance level
- A driver buys gasoline either at BP station (B) or at total station (T) and the following arrangement shows the order of the stations from which she bought gasoline over a certain period of time.
BBTBBBBTTBBBBTBTTBBBBTBTTBBBBBTBBBTBTBBBTTBTTTTBBBBBTTBBBTTT
Test for randomness at 5% significance level
- A manufacturing process produces items with the following diameters (mm):
Data: 25.1, 25.3, 24.8, 25.5, 24.9, 25.2, 24.7, 25.4, 25.0, 25.6, 24.6, 25.7, 24.5, 25.8, 24.4, 25.9, 24.3, 26.0, 24.2, 26.1
Required:
Perform runs test above/below median (manual + R)
Test at α = 0.05 significance level
Calculate exact p-value
Interpret results for quality control
Suggest possible causes if non-random
- Daily returns (%) for a stock over 25 days (Financial Data Analysis):
Returns: 1.2, -0.8, 2.1, -1.5, 0.9, -2.3, 1.8, -0.7, 2.5, -1.9, 0.6, -2.7, 1.4, -0.5, 2.8, -2.1, 0.8, -3.2, 1.7, -0.9, 2.3, -1.8, 0.7, -2.9, 1.5
Required:
Perform up/down runs test (manual + R)
Test at α = 0.01 significance level
Calculate confidence interval for number of runs
Compare with above/below mean test
Write comprehensive analysis report
R Code Template for Assignments
# Assignment Question 1 Template
diameters <- c(25.1, 25.3, 24.8, 25.5, 24.9, 25.2, 24.7, 25.4, 25.0, 25.6,
24.6, 25.7, 24.5, 25.8, 24.4, 25.9, 24.3, 26.0, 24.2, 26.1)
# Using randtests package
if(require(randtests)) {
result1 <- runs.test(diameters, threshold = median(diameters))
print(result1)
}
##
## Runs Test
##
## data: diameters
## statistic = 4.1352, runs = 20, n1 = 10, n2 = 10, n = 20, p-value =
## 3.546e-05
## alternative hypothesis: nonrandomness
# Manual calculation
manual_runs_test(diameters)
## Sequence: B A B A B A B A B A B A B A B A B A B A
## Number of runs (R): 20
## n1 (A's): 10 n2 (B's): 10
## Mean (μ_R): 11
## Variance (σ²_R): 4.736842
## SD (σ_R): 2.176429
## Z-statistic: 4.135215
## $runs
## [1] 20
##
## $Z
## [1] 4.135215
##
## $p_value
## [1] 3.54623e-05
# Assignment Question 2 Template
returns <- c(1.2, -0.8, 2.1, -1.5, 0.9, -2.3, 1.8, -0.7, 2.5, -1.9,
0.6, -2.7, 1.4, -0.5, 2.8, -2.1, 0.8, -3.2, 1.7, -0.9,
2.3, -1.8, 0.7, -2.9, 1.5)
# Up/down runs test
if(require(randtests)) {
result2 <- runs.test(returns)
print(result2)
}
##
## Runs Test
##
## data: returns
## statistic = 4.1742, runs = 23, n1 = 12, n2 = 12, n = 24, p-value =
## 2.99e-05
## alternative hypothesis: nonrandomness
# Manual up/down test
manual_up_down_test(returns)
## Differences: -2 2.9 -3.6 2.4 -3.2 4.1 -2.5 3.2 -4.4 2.5 -3.3 4.1 -1.9 3.3 -4.9 2.9 -4 4.9 -2.6 3.2 -4.1 2.5 -3.6 4.4
## Signs: -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1
## Number of runs (R): 24
## n: 24
## Mean (μ_R): 15.66667
## Variance (σ²_R): 3.944444
## SD (σ_R): 1.986063
## Z-statistic: 4.195907
## $runs
## [1] 24
##
## $Z
## [1] 4.195907
##
## $p_value
## [1] 2.71782e-05
# Above/below mean test for comparison
if(require(randtests)) {
result3 <- runs.test(returns, threshold = mean(returns))
print(result3)
}
##
## Runs Test
##
## data: returns
## statistic = 4.715, runs = 25, n1 = 13, n2 = 12, n = 25, p-value =
## 2.417e-06
## alternative hypothesis: nonrandomness
# Comprehensive analysis function
comprehensive_runs_analysis <- function(data, alpha = 0.05) {
cat("=== COMPREHENSIVE RUNS ANALYSIS ===\n")
cat("Sample size:", length(data), "\n")
cat("Mean:", mean(data), "\n")
cat("Median:", median(data), "\n\n")
# Above/below median test
cat("1. ABOVE/BELOW MEDIAN TEST:\n")
med_test <- manual_runs_test(data, median(data))
cat("P-value:", med_test$p_value, "\n")
cat("Significant at", alpha, "level:", med_test$p_value < alpha, "\n\n")
# Up/down test
cat("2. UP/DOWN RUNS TEST:\n")
up_down_test <- manual_up_down_test(data)
cat("P-value:", up_down_test$p_value, "\n")
cat("Significant at", alpha, "level:", up_down_test$p_value < alpha, "\n\n")
# Confidence interval for runs
n1 <- sum(data > median(data))
n2 <- sum(data < median(data))
mu_R <- (2*n1*n2)/(n1+n2) + 1
var_R <- (2*n1*n2*(2*n1*n2 - n1 - n2))/((n1+n2)^2*(n1+n2-1))
sigma_R <- sqrt(var_R)
z_critical <- qnorm(1 - alpha/2)
CI_lower <- mu_R - z_critical * sigma_R
CI_upper <- mu_R + z_critical * sigma_R
cat("3. CONFIDENCE INTERVAL FOR RUNS:\n")
cat(100*(1-alpha), "% CI: [", CI_lower, ", ", CI_upper, "]\n")
cat("Observed runs:", med_test$runs, "\n")
cat("Within CI:", (med_test$runs >= CI_lower & med_test$runs <= CI_upper), "\n")
}
comprehensive_runs_analysis(returns)
## === COMPREHENSIVE RUNS ANALYSIS ===
## Sample size: 25
## Mean: -0.04
## Median: 0.6
##
## 1. ABOVE/BELOW MEDIAN TEST:
## Sequence: A B A B A B A B A B B B A B A B A B A B A B A B A
## Number of runs (R): 23
## n1 (A's): 12 n2 (B's): 13
## Mean (μ_R): 13.48
## Variance (σ²_R): 5.9696
## SD (σ_R): 2.443276
## Z-statistic: 3.896407
## P-value: 9.763021e-05
## Significant at 0.05 level: TRUE
##
## 2. UP/DOWN RUNS TEST:
## Differences: -2 2.9 -3.6 2.4 -3.2 4.1 -2.5 3.2 -4.4 2.5 -3.3 4.1 -1.9 3.3 -4.9 2.9 -4 4.9 -2.6 3.2 -4.1 2.5 -3.6 4.4
## Signs: -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1
## Number of runs (R): 24
## n: 24
## Mean (μ_R): 15.66667
## Variance (σ²_R): 3.944444
## SD (σ_R): 1.986063
## Z-statistic: 4.195907
## P-value: 2.71782e-05
## Significant at 0.05 level: TRUE
##
## 3. CONFIDENCE INTERVAL FOR RUNS:
## 95 % CI: [ 8.304616 , 17.69538 ]
## Observed runs: 23
## Within CI: FALSE
8 Rank Correlation Coefficient}
Since the assumption underlying the significance test for the correlation coefficient are rather stringent. It is sometimes preferable to use a non parametric alternative. Most popular among the non-parametric measure of association is the rank correlation coefficient also called Spearman’s rank correlation
For a given set of data \((x_1,y_1), (x_2,y_2), ... , (x_n,y_n)\). The rank correlation coefficient is obtained by ranking the \(x_{s}^{'}\) among themselves and also the \(y_{s}^{'}\) both from low to high or from high to low, the rank correlation coefficient is given by
\[r_s=1-\frac{6\sum d_{i}^{2}}{n(n^2 -1)}\]
Where \(d_i\) is the difference between the ranks assigned to \(x_i\) and \(y_i\)
Where there are ties in rank we assign the tied observations the mean of the ranks which they jointly occupy.
8.1 Solved Example
The following are the number of hours which ten students studied for an examination and the marks which they obtained.
| No. of Hours | Marks Scored |
|---|---|
| 8 | 56 |
| 5 | 44 |
| 11 | 79 |
| 13 | 72 |
| 10 | 70 |
| 5 | 54 |
| 18 | 94 |
| 15 | 85 |
| 2 | 33 |
| 8 | 65 |
Required
Calculate rank correlation coefficient
Solution
| No. of Hours | Marks Scored | Rank (H) | Rank (M) | (d^2) |
|---|---|---|---|---|
| 8 | 56 | 6.5 | 7 | 0.25 |
| 5 | 44 | 8.5 | 9 | 0.25 |
| 11 | 79 | 4 | 3 | 1.00 |
| 13 | 72 | 3 | 4 | 1.00 |
| 10 | 70 | 5 | 5 | 0.00 |
| 5 | 54 | 8.5 | 8 | 0.25 |
| 18 | 94 | 1 | 1 | 0.00 |
| 15 | 85 | 2 | 2 | 0.00 |
| 2 | 33 | 10 | 10 | 0.00 |
| 8 | 65 | 6.5 | 6 | 0.25 |
\[r_s=1-\frac{6\sum d_{i}^{2}}{n(n^2 -1)}=1-\frac{6*3}{10(99)}=0.98\]
For small value of \(n\leq 10\) the test of the null hypothesis of no correlation may be based on special tables determined from the exact sample distribution of \(r_s\). Most of times, though we use the fact that the distribution \(r_s\) can be approximated closely with a normal distribution.
Theorem
Under the null hypothesis of no correlation the mean and variance of \(r_s\) are;
\[E(r_s)=0\] \[Var(r_s)=\frac{1}{n-1}\]
NOTE
Strictly speaking, the theorem applying when there are no ties, but the results can be used as an approximation unless the number of ties is large.
8.2 Solved Example
From the data of Example 12 above test whether the rank correlation coefficient is significant at \(\alpha=0.01\)
Solution
\(H_0\): There is no correlation vs \(H_1\): There is correlation is correlation
Test statistic
\[ Z=\frac{r_s -E(r_s)}{\sqrt{Var(r_s)}}=r_s \sqrt{n-1}=0.998*3=2.94\]
We reject \(H_0\) if \(Z\leq -2.575\) or \(Z\geq 2.575\)
Conclusion Since 2.94> 2.575, we reject \(H_0\) and conclude that there is strong positive relationship between study time and grades.
NOTE
When there are no ties in rank, actually equals the correlation coefficient \(r\). Calculate for the ranks. Let \(R_i\) and \(S_i\) be the ranks of \(x_i\) and \(y_i\). We find that
\[ \sum_{i=1}^{n} r_i = \sum_{i=1}^{n} s_i = \frac{n(n+1)}{2} \]
\[ \sum_{i=1}^{n} r_i^2 = \sum_{i=1}^{n} s_i^2 = \frac{n(n+1)(2n+1)}{6} \]
Let
\[
d_i = r_i - s_i
\]
Then,
\[
d_i^2 = (r_i - s_i)^2 = r_i^2 - 2r_i s_i + s_i^2
\]
Summing over all \(i\),
\[
\sum_{i=1}^{n} d_i^2 = \sum_{i=1}^{n} r_i^2 - 2\sum_{i=1}^{n} r_i s_i + \sum_{i=1}^{n} s_i^2
\]
Rearranging,
\[
2\sum_{i=1}^{n} r_i s_i = \sum_{i=1}^{n} r_i^2 + \sum_{i=1}^{n} s_i^2 - \sum_{i=1}^{n} d_i^2
\]
Hence,
\[
\sum_{i=1}^{n} r_i s_i = \frac{\sum_{i=1}^{n} r_i^2 + \sum_{i=1}^{n} s_i^2 - \sum_{i=1}^{n} d_i^2}{2}
\]
Substituting
\(\sum r_i^2 = \sum s_i^2 = \frac{n(n+1)(2n+1)}{6}\),
we get:
\[ \sum_{i=1}^{n} r_i s_i = \frac{n(n+1)(2n+1)}{6} - \frac{\sum_{i=1}^{n} d_i^2}{2} \]
\[ r = \frac{ \sum_{i=1}^{n} r_i s_i - \frac{ \left(\sum_{i=1}^{n} r_i\right) \left(\sum_{i=1}^{n} s_i\right) }{n} }{ \sqrt{ \left(\sum_{i=1}^{n} r_i^2 - n\overline{r}^2\right) \left(\sum_{i=1}^{n} s_i^2 - n\overline{s}^2\right) } } \]
\[ r = \frac{ \frac{n(n+1)(2n+1)}{6} - \frac{\sum_{i=1}^{n} d_i^2}{2} - \frac{1}{n} \frac{n(n+1)}{2} \frac{n(n+1)}{2} }{ \sqrt{ \left[\frac{n(n+1)(2n+1)}{6} - n\left(\frac{n+1}{2}\right)^2\right] \left[\frac{n(n+1)(2n+1)}{6} - n\left(\frac{n+1}{2}\right)^2\right] } } \]
\[ r = \frac{ \left[\frac{n(n+1)(2n+1)}{6} - \frac{(n\frac{(n+1)}{2})^2}{n}\right] - \frac{\sum_{i=1}^{n} d_i^2}{2} }{ \left[\frac{n(n+1)(2n+1)}{6} - \frac{(n\frac{(n+1)}{2})^2}{n}\right] } \]
\[ r = 1 - \frac{ \frac{\sum_{i=1}^{n} d_i^2}{2} }{ \left[ \frac{n(n+1)(2n+1)}{6} - \frac{(n\frac{(n+1)}{2})^2}{n} \right] } \]
Workings
\[ \left[\frac{n(n+1)(2n+1)}{6}-\frac{\big(n\frac{(n+1)}{2}\big)^{2}}{n}\right]. \]
Factor and simplify:
\[ \begin{aligned} \frac{n(n+1)(2n+1)}{6}-\frac{\big(n\frac{(n+1)}{2}\big)^{2}}{n} &= \frac{n+1}{12}\big[2n(2n+1)-3n(n+1)\big] \\[6pt] &= \frac{n+1}{12}\big[4n^2+2n-3n^2-3n\big] \\[6pt] &= \frac{n+1}{12}\big[n^2-n\big] \\[6pt] &= \frac{n(n+1)(n-1)}{12} \\[6pt] &= \frac{n(n^2-1)}{12}. \end{aligned} \]
Substituting this into the earlier expression for \(r\) gives
\[ r = 1 - \frac{\dfrac{\sum_{i=1}^{n} d_i^2}{2}}{\dfrac{n(n^2-1)}{12}} = 1 - \frac{6\sum_{i=1}^{n} d_i^2}{n(n^2-1)}. \]
This is the standard form of Spearman’s rank correlation coefficient:
\[ r_s \;=\; 1 - \frac{6\sum d_i^2}{n(n^2-1)}. \]
8.3 Exercise & Assessment
- The following table shows how a pand of nutrition experts and a pand of house wives ranked 15 breakfast foods on their palatability breakfast.
| Nutrition | Housewives |
|---|---|
| 3 | 5 |
| 7 | 4 |
| 11 | 8 |
| 9 | 14 |
| 1 | 2 |
| 4 | 6 |
| 10 | 12 |
| 8 | 7 |
| 5 | 1 |
| 13 | 15 |
| 12 | 9 |
| 2 | 3 |
| 15 | 10 |
| 6 | 11 |
| 4 | 13 |
Required
Calculate rank correlation coefficient and the test the null hypothesis of no correlation at \(\alpha=0.05\)
- The following are the marks which 12 students obtained in the mid-term and final Examination in a course in Statistics
| Midterm | Final Exam |
|---|---|
| 71 | 83 |
| 49 | 62 |
| 80 | 76 |
| 73 | 77 |
| 93 | 89 |
| 85 | 74 |
| 58 | 48 |
| 52 | 78 |
| 64 | 76 |
| 32 | 51 |
| 87 | 73 |
| 80 | 89 |
Required
Calculate rank correlation coefficient and the test the null hypothesis of no correlation at \(\alpha=0.05\)
- The Sociology marks (X) and Statistics marks (Y) of 10 students are given below
| X | Y |
|---|---|
| 78 | 80 |
| 86 | 74 |
| 49 | 63 |
| 94 | 85 |
| 53 | 55 |
| 97 | 90 |
| 74 | 85 |
| 53 | 71 |
| 58 | 67 |
| 62 | 64 |
Required: Test whether the rank correlation coefficient is significant at 5% level of significance
- An educational researcher wants to compare the effectiveness of two study methods. She randomly selects 12 students and has each student study a particular topic using method A and method B (in random order). The students’ test scores (out of 100) are recorded for both methods. The data is shown below:
| Student | Score (Method A) | Score (Method B) |
|---|---|---|
| 1 | 22 | 53 |
| 2 | 37 | 68 |
| 3 | 36 | 42 |
| 4 | 38 | 49 |
| 5 | 42 | 51 |
| 6 | 38 | 65 |
| 7 | 58 | 51 |
| 8 | 60 | 71 |
| 9 | 62 | 55 |
| 10 | 65 | 74 |
| 11 | 66 | 68 |
| 12 | 56 | 64 |
| 13 | 66 | 67 |
| 14 | 67 | 73 |
| 15 | 62 | 65 |
Required: Test if there is a significant difference between the two study methods at the 5% significance level.
9 The Sample Distribution Function (Empirical Distribution Function)
Suppose that the random variable \(X_1, X_2, ... ,X_n\) form a random sample from some continuous distribution and let \(x_1, x_2, ... ,x_n\) denote the observed values of \(X_1, X_2, ... ,X_n\).
Since the observation come from a continuous distribution there is probability zero that any two of the observed values \(x_1, x_2, ... ,x_n\) will be equal. Therefore, we shall assume for simplicity that all \(n\) values are different. We consider a function \(F_n(x)\) which is constructed from the value \(x_1, x_2, ... ,x_n\) as follows.
For each number\(x(-\infty <x<\infty)\) the value of \(F_n(x)\) is defined as the proportion of observed values in the sample to \(x\). In other words if exactly \(k\) of the observed values in the sample are less than or equal to \(x\), then \(F_n(x)=\frac{k}{n}\). the Function \(F_n(x)\) defined this way is called the sample distribution function (s.d.f)}. Sometimes \(F_n(x)\) is called the Empirical distribution function
If we let \(y_1<y_2< ... <y_n\) denote the values of the order statistics of the sample then \(F_n(x)=0\) for $ x<y_1$. \(F_n(x)\) jumps to the value \(\frac{1}{n}\) at\(x=y_1\) and remains at \(\frac{1}{n}\) for \(y_1\leq x<y_2\), \(F_n(x)\) jumps to \(\frac{2}{n}\) at \(x=y_2\) and remaining at \(\frac{2}{n}\) for \(y_2\leq x<y_3\) and so on.
Now let \(F(x)\) denote the distribution function of the distribution from which the random sample \(x_1, x_2, ... ,x_n\) was drawn. For any given number \(x(-\infty <x<\infty)\) the probability that any particular \(X\) is less than or equal to \(x\) if \(F(x)\), therefore, it follows from the law of large numbers that as \(n\rightarrow \infty\), the proportion of \(F_n(x)\) of the observation is the sample which are less than or equal to \(x\) will converge to \(F(x)\) on probability. In symbols;
\[ P\left[\lim_{n\to\infty} f_n(x)\right]=F(x), For -\infty<x,\infty \]
n=sample size of \(x_1,x_2, ... ,x_n\).
The relation above expresses the fact that each point x, the sample distribution \(F_n(x)\) will converge to the actual \(F(x)\) of the distribution from which the random samples was drawn.
The Sample Distribution Function (also called Empirical Distribution Function - EDF) is a non-parametric estimator of the cumulative distribution function (CDF) of a random variable. It is defined as the proportion of sample observations that are less than or equal to a given value x.
For a sample \(X_1, X_2, \ldots, X_n\), the Empirical Distribution Function (EDF) is defined as:
\[F_n(x) = \frac{1}{n} \sum_{i=1}^{n} I(X_i \leq x)\]
where \(I(\cdot)\) is the indicator function that equals 1 if the condition is true and 0 otherwise.
Key Properties:
Non-parametric: Makes no assumptions about underlying distribution
Step function: Jumps by \[1/n\] at each data point (or by \[k/n\] for k ties)
Consistent estimator:
\[F_n(x)→F(x)\] as n→ ∞ n→∞ (Glivenko-Cantelli Theorem)
- Unbiased:
\[E[F_n(x)]=F(x)\]
- Applications:
Goodness-of-fit tests (Kolmogorov-Smirnov, Anderson-Darling)
Non-parametric confidence intervals
Visualizing data distribution
Comparing multiple distributions
9.1 Solved Example
Given the following sample of 10 measurements:
Data: 12.5, 15.2, 18.7, 14.1, 16.8, 13.9, 17.3, 15.6, 19.2, 14.8
Construct the empirical distribution function.
Solution
Step 1: Sort the Data
Sorted values: 12.5, 13.9, 14.1, 14.8, 15.2, 15.6, 16.8, 17.3, 18.7, 19.2
Step 2: Calculate EDF Values
| Interval (x) | # of observations ≤ (x) | (F_n(x)) |
|---|---|---|
| (x < 12.5) | 0 | 0.0 |
| (12.5 x < 13.9) | 1 | 0.1 |
| (13.9 x < 14.1) | 2 | 0.2 |
| (14.1 x < 14.8) | 3 | 0.3 |
| (14.8 x < 15.2) | 4 | 0.4 |
| (15.2 x < 15.6) | 5 | 0.5 |
| (15.6 x < 16.8) | 6 | 0.6 |
| (16.8 x < 17.3) | 7 | 0.7 |
| (17.3 x < 18.7) | 8 | 0.8 |
| (18.7 x < 19.2) | 9 | 0.9 |
| (x ) | 10 | 1.0 |
Step 3: EDF Formula \[ F_n(x) = \begin{cases} 0.0 & \text{if } x < 12.5 \\ 0.1 & \text{if } 12.5 \leq x < 13.9 \\ 0.2 & \text{if } 13.9 \leq x < 14.1 \\ 0.3 & \text{if } 14.1 \leq x < 14.8 \\ 0.4 & \text{if } 14.8 \leq x < 15.2 \\ 0.5 & \text{if } 15.2 \leq x < 15.6 \\ 0.6 & \text{if } 15.6 \leq x < 16.8 \\ 0.7 & \text{if } 16.8 \leq x < 17.3 \\ 0.8 & \text{if } 17.3 \leq x < 18.7 \\ 0.9 & \text{if } 18.7 \leq x < 19.2 \\ 1.0 & \text{if } x \geq 19.2 \end{cases} \]
Step 4: Calculate Specific Values
Fₙ(14.0) = 0.1 (since 14.0 < 13.9 is false, but 14.0 < 14.1 is true)
Fₙ(15.5) = 0.5
Fₙ(17.0) = 0.7
Fₙ(20.0) = 1.0
9.2 Solved Example : Manual Computation with Ties
Given the following sample with repeated values:
Data: 8, 12, 8, 15, 12, 18, 15, 20, 12, 8
Construct the empirical distribution function.
Solution
Step 1: Sort the Data and Count Frequencies
Sorted values: 8, 8, 8, 12, 12, 12, 15, 15, 18, 20
| Value | Frequency | Cumulative Frequency |
|---|---|---|
| 8 | 3 | 3 |
| 12 | 3 | 6 |
| 15 | 2 | 8 |
| 18 | 1 | 9 |
| 20 | 1 | 10 |
Step 2: Calculate EDF Values
| Interval (x) | # of observations ≤ (x) | (F_n(x)) |
|---|---|---|
| (x < 8) | 0 | 0.0 |
| (8 x < 12) | 3 | 0.3 |
| (12 x < 15) | 6 | 0.6 |
| (15 x < 18) | 8 | 0.8 |
| (18 x < 20) | 9 | 0.9 |
| (x ) | 10 | 1.0 |
Step 3: EDF Formula
\[ F_n(x) = \begin{cases} 0.0 & \text{if } x < 8 \\ 0.3 & \text{if } 8 \leq x < 12 \\ 0.6 & \text{if } 12 \leq x < 15 \\ 0.8 & \text{if } 15 \leq x < 18 \\ 0.9 & \text{if } 18 \leq x < 20 \\ 1.0 & \text{if } x \geq 20 \end{cases} \]
Step 4: Calculate Specific Values
Fₙ(10) = 0.3
Fₙ(12) = 0.6
Fₙ(16) = 0.8
Fₙ(19) = 0.9
R Code Implementation
# Example 1: Continuous Data
data1 <- c(12.5, 15.2, 18.7, 14.1, 16.8, 13.9, 17.3, 15.6, 19.2, 14.8)
# Using ecdf function
ecdf1 <- ecdf(data1)
print(ecdf1)
## Empirical CDF
## Call: ecdf(data1)
## x[1:10] = 12.5, 13.9, 14.1, ..., 18.7, 19.2
# Calculate specific values
cat("F(14.0) =", ecdf1(14.0), "\n")
## F(14.0) = 0.2
cat("F(15.5) =", ecdf1(15.5), "\n")
## F(15.5) = 0.5
cat("F(17.0) =", ecdf1(17.0), "\n")
## F(17.0) = 0.7
cat("F(20.0) =", ecdf1(20.0), "\n")
## F(20.0) = 1
# Plot EDF
plot(ecdf1, main = "Empirical Distribution Function",
xlab = "x", ylab = "F(x)", col = "blue")
grid()
# Manual calculation function
manual_edf <- function(data, x_values) {
sorted_data <- sort(data)
n <- length(data)
cat("Sorted data:", sorted_data, "\n")
cat("Sample size n =", n, "\n\n")
for(x in x_values) {
count <- sum(data <= x)
F_x <- count/n
cat(sprintf("F(%.1f) = %d/%d = %.2f\n", x, count, n, F_x))
}
# Return EDF function
edf_function <- function(x) {
sapply(x, function(x_val) sum(data <= x_val)/n)
}
return(edf_function)
}
# Test manual function
x_test <- c(14.0, 15.5, 17.0, 20.0)
manual_edf(data1, x_test)
## Sorted data: 12.5 13.9 14.1 14.8 15.2 15.6 16.8 17.3 18.7 19.2
## Sample size n = 10
##
## F(14.0) = 2/10 = 0.20
## F(15.5) = 5/10 = 0.50
## F(17.0) = 7/10 = 0.70
## F(20.0) = 10/10 = 1.00
## function (x)
## {
## sapply(x, function(x_val) sum(data <= x_val)/n)
## }
## <bytecode: 0x000002445f304740>
## <environment: 0x000002446070ae40>
# Example 2: Data with Ties
data2 <- c(8, 12, 8, 15, 12, 18, 15, 20, 12, 8)
ecdf2 <- ecdf(data2)
print(ecdf2)
## Empirical CDF
## Call: ecdf(data2)
## x[1:5] = 8, 12, 15, 18, 20
# Calculate specific values
cat("F(10) =", ecdf2(10), "\n")
## F(10) = 0.3
cat("F(12) =", ecdf2(12), "\n")
## F(12) = 0.6
cat("F(16) =", ecdf2(16), "\n")
## F(16) = 0.8
cat("F(19) =", ecdf2(19), "\n")
## F(19) = 0.9
# Plot both EDFs for comparison
plot(ecdf1, main = "Comparison of EDFs", col = "blue", lwd = 2)
plot(ecdf2, col = "red", lwd = 2, add = TRUE)
legend("bottomright", legend = c("Data1", "Data2"),
col = c("blue", "red"), lwd = 2)
grid()
9.3 Exercise & Assignment
- A researcher collected reaction times (in milliseconds) from 15 participants:
Data: 245, 289, 267, 312, 278, 295, 301, 256, 324, 288, 273, 299, 315, 282, 307
Required:
Construct the empirical distribution function manually
Calculate Fₙ(280), Fₙ(300), Fₙ(320)
Plot the EDF using R
Estimate P(270 ≤ X ≤ 310) using the EDF
Find the 80th percentile using the EDF
- Two different teaching methods were tested. Exam scores were recorded:
Method A: 78, 85, 92, 65, 70, 88, 75, 82, 95, 80
Method B: 72, 68, 80, 75, 82, 79, 74, 85, 78, 90
Required:
Construct EDFs for both groups manually
Plot both EDFs on the same graph using R
Calculate Fₙ(80) for both groups
Estimate the median from each EDF
Perform visual comparison and interpret differences
Calculate the maximum vertical distance between EDFs (Kolmogorov-Smirnov statistic)
R Code Template for Assignments
# Assignment Question 1 Template
reaction_times <- c(245, 289, 267, 312, 278, 295, 301, 256, 324, 288, 273, 299, 315, 282, 307)
# Manual EDF construction function
construct_edf <- function(data) {
sorted <- sort(data)
n <- length(data)
cat("Sorted data:", sorted, "\n")
cat("Sample size:", n, "\n\n")
# Create EDF table
unique_vals <- unique(sorted)
edf_table <- data.frame(
x_min = c(-Inf, unique_vals),
x_max = c(unique_vals, Inf),
F_n = c(0, cumsum(table(sorted))/n)
)
print(edf_table)
# EDF function
edf <- function(x) {
sapply(x, function(x_val) sum(data <= x_val)/n)
}
return(edf)
}
edf_rt <- construct_edf(reaction_times)
## Sorted data: 245 256 267 273 278 282 288 289 295 299 301 307 312 315 324
## Sample size: 15
##
## x_min x_max F_n
## -Inf 245 0.00000000
## 245 245 256 0.06666667
## 256 256 267 0.13333333
## 267 267 273 0.20000000
## 273 273 278 0.26666667
## 278 278 282 0.33333333
## 282 282 288 0.40000000
## 288 288 289 0.46666667
## 289 289 295 0.53333333
## 295 295 299 0.60000000
## 299 299 301 0.66666667
## 301 301 307 0.73333333
## 307 307 312 0.80000000
## 312 312 315 0.86666667
## 315 315 324 0.93333333
## 324 324 Inf 1.00000000
# Calculate specific values
cat("F(280) =", edf_rt(280), "\n")
## F(280) = 0.3333333
cat("F(300) =", edf_rt(300), "\n")
## F(300) = 0.6666667
cat("F(320) =", edf_rt(320), "\n")
## F(320) = 0.9333333
# Probability estimation
P_270_310 <- edf_rt(310) - edf_rt(270)
cat("P(270 ≤ X ≤ 310) ≈", P_270_310, "\n")
## P(270 ≤ X ≤ 310) ≈ 0.6
# Percentile estimation
find_percentile <- function(edf, data, p) {
sorted <- sort(data)
n <- length(data)
index <- ceiling(p * n)
return(sorted[index])
}
cat("80th percentile ≈", find_percentile(edf_rt, reaction_times, 0.8), "\n")
## 80th percentile ≈ 307
# Plot
plot(ecdf(reaction_times), main = "EDF of Reaction Times",
xlab = "Reaction Time (ms)", ylab = "F(x)", col = "blue", lwd = 2)
grid()
# Assignment Question 2 Template
method_A <- c(78, 85, 92, 65, 70, 88, 75, 82, 95, 80)
method_B <- c(72, 68, 80, 75, 82, 79, 74, 85, 78, 90)
# Construct EDFs
edf_A <- construct_edf(method_A)
## Sorted data: 65 70 75 78 80 82 85 88 92 95
## Sample size: 10
##
## x_min x_max F_n
## -Inf 65 0.0
## 65 65 70 0.1
## 70 70 75 0.2
## 75 75 78 0.3
## 78 78 80 0.4
## 80 80 82 0.5
## 82 82 85 0.6
## 85 85 88 0.7
## 88 88 92 0.8
## 92 92 95 0.9
## 95 95 Inf 1.0
edf_B <- construct_edf(method_B)
## Sorted data: 68 72 74 75 78 79 80 82 85 90
## Sample size: 10
##
## x_min x_max F_n
## -Inf 68 0.0
## 68 68 72 0.1
## 72 72 74 0.2
## 74 74 75 0.3
## 75 75 78 0.4
## 78 78 79 0.5
## 79 79 80 0.6
## 80 80 82 0.7
## 82 82 85 0.8
## 85 85 90 0.9
## 90 90 Inf 1.0
# Compare specific values
cat("F_A(80) =", edf_A(80), "\n")
## F_A(80) = 0.5
cat("F_B(80) =", edf_B(80), "\n")
## F_B(80) = 0.7
# Medians
median_A <- find_percentile(edf_A, method_A, 0.5)
median_B <- find_percentile(edf_B, method_B, 0.5)
cat("Median A =", median_A, "\n")
## Median A = 80
cat("Median B =", median_B, "\n")
## Median B = 78
# Kolmogorov-Smirnov statistic
ks_statistic <- function(data1, data2) {
all_vals <- sort(unique(c(data1, data2)))
edf1 <- ecdf(data1)
edf2 <- ecdf(data2)
differences <- abs(edf1(all_vals) - edf2(all_vals))
max_diff <- max(differences)
cat("K-S statistic =", max_diff, "\n")
return(max_diff)
}
ks_value <- ks_statistic(method_A, method_B)
## K-S statistic = 0.2
# Plot comparison
plot(ecdf(method_A), main = "Comparison of EDFs: Method A vs B",
xlab = "Exam Score", ylab = "F(x)", col = "blue", lwd = 2)
plot(ecdf(method_B), col = "red", lwd = 2, add = TRUE)
legend("bottomright", legend = c("Method A", "Method B"),
col = c("blue", "red"), lwd = 2)
grid()
10 Kolmogorov-Smirnov Test
10.1 Kolmogorov- Smirnor test
Suppose that we wish to test the simple null hypothesis that the unknown distribution function \(f(x)\) is actually a particular continuous distribution function \(F^x(x)\) against the general alternative that the actual distribution function is not \(F^*(x)\) i.e
\[H_0 : F_n(x)=F^* (x), -\infty < x \infty\] \[H_1 : F_n(x) \neq F^* (x), -\infty < x \infty \]
This problem is a non-parametric problem because the unknown distribution from which the random sample was taken might be any continuous distribution. Let $ F_n(x)$ denote the sample distribution function (s.d.f).
Let \[D_n=\sup_{-\infty<x<\infty} |F_n(x)- F^(x)|\]
In other words \(D_n\) is the maximum difference that S.d.f \(F_n(x)\) and the hypothesis \(F^*(x)\). When the null hypothesis \(F^*(x)\) is true, the probability distribution of \(D_n\) will be a certain distribution which is the same for any possible continuous distribution function \(F^*(x)\) and \(F^*(x)\) depends on the particular distribution function \(F^*(x)\) being studied in a specified problem
Table of this distribution for various values of the sample size, \(n\), have been developed and one tabulated in collection of statistical tables. The table give the value of \(D_n^\alpha\) such that; \[P\left[D_n \leq D_n^ \alpha\right]=1-\alpha\] when \(H_0\) is true.
\[P\left[D_n > D_n^ \alpha\right]=\alpha\]
The Kolmogorov-Smirnov (K-S) test is a non-parametric statistical test used to determine whether a sample comes from a specified distribution (one-sample test) or whether two samples come from the same distribution (two-sample test).
Key Concepts:
Non-parametric: No assumptions about distribution parameters
Based on EDF: Uses the Empirical Distribution Function
Test statistic: Maximum vertical distance between distributions
Sensitive to differences: In location, shape, and scale
Types of K-S Tests:
1. One-sample K-S test: Compares sample EDF with theoretical CDF
2. Two-sample K-S test: Compares EDFs of two samples
Hypotheses:
H₀: The sample follows the specified distribution (one-sample) or both samples come from the same distribution (two-sample)
H₁: The sample does not follow the specified distribution or the samples come from different distributions
Test Statistic
For one-sample KS Test
The test statistic \(D\) is defined as:
\[D = \sup_x \left| F_n(x) - F(x) \right|\]
- \(F_n(x)\): Empirical distribution function (EDF) of the sample.
- \(F(x)\): Theoretical cumulative distribution function (CDF).
- \(\sup_x\): The supremum (maximum) over all values of \(x\).
This measures the largest absolute difference between the empirical and theoretical distributions.
Two-Sample KS Test
The test statistic \(D\) is defined as:
\[D = \sup_x \left| F_{n1}(x) - F_{n2}(x) \right|\]
- \(F_{n1}(x)\): EDF of the first sample.
- \(F_{n2}(x)\): EDF of the second sample.
This measures the largest absolute difference between the two empirical distributions.
10.2 Solved Example
Test the hypothesis by the Kolmogorov -Smirnor test, that the following sample come from a standard normal distribution. Take \(\alpha=0.05\)
| -1.23 | 1.64 | -0.21 | 0.70 | 1.40 |
| 0.44 | -0.07 | 1.07 | -0.02 | -0.15 |
| 1.76 | 1.62 | 0.40 | -2.11 | -0.99 |
| -0.42 | 0.81 | 1.47 | -2.46 | 0.88 |
| 1.39 | 0.42 | 0.27 | -0.39 | -0.10 |
Solution
\[H_0 : F_n(x)=F^*(x), -2.46 < x 1.76\] \[H_1 : F_n(x) \neq F^*(x), -2.46 < x 1.76 \]
where \(F^*(x)\) is \(\phi(x)\) the cumulative distribution of the standard normal distribution
\[ D_{25} = \sup_{-2.46 < x < 1.76} \, \big| F_{25}(x) - F^{*}(x) \big| = 0.137 \]
\(D_25^0.05=0.27\)
\(D_25 =0.137<0.27\)
Conclusion Since \(D_25 =0.137<0.27\), we accept \(H_0\) and conclude that the sample comes from a standard normal distribution at \(\alpha=0.05\)
Workings
We arrange the sample value in ascending order and for each x, we determine \(F_n(x)\) and \(F^*(x)\).
| i | xᵢ | Fₙ(x) | F*(x) | Absolute diff |
|---|---|---|---|---|
| 1 | -2.46 | 0.04 | 0.007 | 0.033 |
| 2 | -2.11 | 0.08 | 0.017 | 0.063 |
| 3 | -1.23 | 0.12 | 0.109 | 0.011 |
| 4 | -0.99 | 0.16 | 0.161 | 0.001 |
| 5 | -0.42 | 0.20 | 0.337 | 0.137 |
| 6 | -0.39 | 0.24 | 0.348 | 0.080 |
| 7 | -0.21 | 0.28 | 0.417 | 0.137 |
| 8 | -0.55 | 0.32 | 0.440 | 0.120 |
| 9 | -0.10 | 0.36 | 0.460 | 0.100 |
| 10 | -0.07 | 0.40 | 0.472 | 0.072 |
| 11 | -0.02 | 0.44 | 0.492 | 0.052 |
| 12 | 0.27 | 0.48 | 0.606 | 0.126 |
| 13 | 0.40 | 0.52 | 0.655 | 0.135 |
| 14 | 0.42 | 0.56 | 0.663 | 0.103 |
| 15 | 0.44 | 0.60 | 0.670 | 0.070 |
| 16 | 0.70 | 0.64 | 0.758 | 0.118 |
| 17 | 0.81 | 0.68 | 0.791 | 0.111 |
| 18 | 0.88 | 0.72 | 0.811 | 0.091 |
| 19 | 1.07 | 0.76 | 0.858 | 0.098 |
| 20 | 1.39 | 0.80 | 0.918 | 0.118 |
| 21 | 1.40 | 0.84 | 0.919 | 0.079 |
| 22 | 1.47 | 0.88 | 0.927 | 0.047 |
| 23 | 1.62 | 0.92 | 0.947 | 0.027 |
| 24 | 1.64 | 0.96 | 0.950 | 0.010 |
| 25 | 1.76 | 1.00 | 0.961 | 0.039 |
10.3 Solved Example : One-Sample K-S Test (Manual)
Test whether the following sample comes from a normal distribution with μ=50, σ=10 (α=0.05)
Data: 45, 52, 58, 48, 60, 55, 53, 50, 47, 56
Solution
Step 1: State Hypotheses
H₀: The data follows N(50, 10²)
H₁: The data does not follow N(50, 10²)
Step 2: Sort Data and Calculate EDF
Sorted data: 45, 47, 48, 50, 52, 53, 55, 56, 58, 60
n = 10
| (x) | (i) | F_n(x) = i/n |
|---|---|---|
| 45 | 1 | 0.1 |
| 47 | 2 | 0.2 |
| 48 | 3 | 0.3 |
| 50 | 4 | 0.4 |
| 52 | 5 | 0.5 |
| 53 | 6 | 0.6 |
| 55 | 7 | 0.7 |
| 56 | 8 | 0.8 |
| 58 | 9 | 0.9 |
| 60 | 10 | 1.0 |
Step 3: Calculate Theoretical CDF F(x)
Using standard normal:\[z = (x - 50)/10\]
| (x) | (z) | (F(z)) |
|---|---|---|
| 45 | -0.5 | 0.3085 |
| 47 | -0.3 | 0.3821 |
| 48 | -0.2 | 0.4207 |
| 50 | 0.0 | 0.5000 |
| 52 | 0.2 | 0.5793 |
| 53 | 0.3 | 0.6179 |
| 55 | 0.5 | 0.6915 |
| 56 | 0.6 | 0.7257 |
| 58 | 0.8 | 0.7881 |
| 60 | 1.0 | 0.8413 |
Step 4: Calculate Differences
| x | Fₙ(x) | F(x) | \[|Fₙ(x) - F(x)|\]] |
|---|---|---|---|
| 45 | 0.1 | 0.3085 | 0.2085 |
| 47 | 0.2 | 0.3821 | 0.1821 |
| 48 | 0.3 | 0.4207 | 0.1207 |
| 50 | 0.4 | 0.5000 | 0.1000 |
| 52 | 0.5 | 0.5793 | 0.0793 |
| 53 | 0.6 | 0.6179 | 0.0179 |
| 55 | 0.7 | 0.6915 | 0.0085 |
| 56 | 0.8 | 0.7257 | 0.0743 |
| 58 | 0.9 | 0.7881 | 0.1119 |
| 60 | 1.0 | 0.8413 | 0.1587 |
Step 5: Find Maximum Difference
\[D = \sup_x \left| F_n(x) - F(x) \right| = max(0.2085, 0.1821, ..., 0.1587) = 0.2085\]
Step 6: Critical Value
For n=10, α=0.05, the critical value from K-S table is 0.409
Step 7: Decision
\[D = 0.2085 < 0.409\] ⇒ Fail to reject H₀
Conclusion: No evidence that the data doesn’t follow N(50,10²)
10.4 Solved Example : Two-Sample K-S Test (Manual)
Test whether two samples come from the same distribution (α=0.05)
Sample A: 12, 15, 18, 20, 2
Sample B: 10, 14, 16, 19, 24
Solution
Step 1: State Hypotheses
H₀: Both samples come from the same distribution
H₁: The samples come from different distributions
Step 2: Combine and Sort All Data
Combined sorted: 10(B), 12(A), 14(B), 15(A), 16(B), 18(A), 19(B), 20(A), 22(A), 24(B)
Step 3: Calculate EDFs for Both Samples
n₁ = 5, n₂ = 5
| x | Source | Fₙ₁(x) | Fₙ₂(x) | Absolute Difference |
|---|---|---|---|---|
| 10 | B | 0/5 = 0.0 | 1/5 = 0.2 | 0.2 |
| 12 | A | 1/5 = 0.2 | 1/5 = 0.2 | 0.0 |
| 14 | B | 1/5 = 0.2 | 2/5 = 0.4 | 0.2 |
| 15 | A | 2/5 = 0.4 | 2/5 = 0.4 | 0.0 |
| 16 | B | 2/5 = 0.4 | 3/5 = 0.6 | 0.2 |
| 18 | A | 3/5 = 0.6 | 3/5 = 0.6 | 0.0 |
| 19 | B | 3/5 = 0.6 | 4/5 = 0.8 | 0.2 |
| 20 | A | 4/5 = 0.8 | 4/5 = 0.8 | 0.0 |
| 22 | A | 5/5 = 1.0 | 4/5 = 0.8 | 0.2 |
| 24 | B | 5/5 = 1.0 | 5/5 = 1.0 | 0.0 |
Step 4: Calculate Test Statistic
\[D =sup(|Fₙ₁(x) - Fₙ₂(x)|) = 0.2 \]
Step 5: Critical Value
For n₁=n₂=5, α=0.05, the critical value from K-S table is 0.6
Step 6: Decision
D = 0.2 < 0.6 ⇒ Fail to reject H₀
Conclusion: No evidence that the samples come from different distributions
R Code Implementation
# Example 1: One-Sample K-S Test
data1 <- c(45, 52, 58, 48, 60, 55, 53, 50, 47, 56)
# Test against normal distribution with mean=50, sd=10
ks_result1 <- ks.test(data1, "pnorm", mean=50, sd=10)
print(ks_result1)
##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: data1
## D = 0.30854, p-value = 0.2421
## alternative hypothesis: two-sided
# Manual calculation verification
manual_ks_one_sample <- function(data, dist_func, ...) {
n <- length(data)
sorted_data <- sort(data)
# Empirical CDF
F_n <- (1:n)/n
# Theoretical CDF
F_theoretical <- dist_func(sorted_data, ...)
# Calculate differences
D_plus <- max(F_n - F_theoretical)
D_minus <- max(F_theoretical - c(0, F_n[-n]))
D <- max(D_plus, D_minus)
cat("D+ =", D_plus, "\n")
cat("D- =", D_minus, "\n")
cat("D =", D, "\n")
return(D)
}
D_manual <- manual_ks_one_sample(data1, pnorm, mean=50, sd=10)
## D+ = 0.1586553
## D- = 0.3085375
## D = 0.3085375
# Example 2: Two-Sample K-S Test
sample_A <- c(12, 15, 18, 20, 22)
sample_B <- c(10, 14, 16, 19, 24)
ks_result2 <- ks.test(sample_A, sample_B)
print(ks_result2)
##
## Exact two-sample Kolmogorov-Smirnov test
##
## data: sample_A and sample_B
## D = 0.2, p-value = 1
## alternative hypothesis: two-sided
# Manual calculation for two-sample
manual_ks_two_sample <- function(sample1, sample2) {
n1 <- length(sample1)
n2 <- length(sample2)
# Combine and sort all data
all_data <- sort(c(sample1, sample2))
# Calculate EDFs
F1 <- sapply(all_data, function(x) sum(sample1 <= x)/n1)
F2 <- sapply(all_data, function(x) sum(sample2 <= x)/n2)
# Calculate maximum difference
D <- max(abs(F1 - F2))
cat("Maximum difference D =", D, "\n")
# Create comparison table
comparison <- data.frame(
x = all_data,
F1 = F1,
F2 = F2,
diff = abs(F1 - F2)
)
print(comparison)
return(D)
}
D_manual_2 <- manual_ks_two_sample(sample_A, sample_B)
## Maximum difference D = 0.2
## x F1 F2 diff
## 1 10 0.0 0.2 0.2
## 2 12 0.2 0.2 0.0
## 3 14 0.2 0.4 0.2
## 4 15 0.4 0.4 0.0
## 5 16 0.4 0.6 0.2
## 6 18 0.6 0.6 0.0
## 7 19 0.6 0.8 0.2
## 8 20 0.8 0.8 0.0
## 9 22 1.0 0.8 0.2
## 10 24 1.0 1.0 0.0
10.5 Exercise & Assignment
- Using the Kolmogorov-Smirnov methods, test the hypothesis that the following values form a random sample from a normal distribution with mean of 2 and variance of 4. Use alpha=5%
2.72 3.84 0.88 5.72 5.48 3.12 0.10 2.48 1.70 0.52 3.64 3.40 1.80 -0.52 -0.12 2.30 3.10 1.04 1.02
- Test the hypothesis by the Kolmogorov -Smirnor test that the following sample come form a normal distribution with a mean 0.07 and variance of 1 .(Take \(\alpha=0.05\))
0.36 0.92 -0.56 1.86 1.74 0.56 -0.95 0.24 -0.15 -0.48 -0.74 0.34 0.82 0.70 -0.10 -1.26 -1.06 0.15 0.55 -0.49
- A manufacturer claims their product has a lifetime following an exponential distribution with mean 100 hours. Test this claim using the following sample of 12 product lifetimes (hours):
Data: 85, 120, 65, 150, 95, 110, 78, 135, 88, 125, 102, 140
Required:
Perform manual one-sample K-S test (show all steps)
Test at α=0.05 significance level
Calculate exact p-value using approximation
Verify with R code
Interpret results for quality control
- Compare the distribution of test scores between two different teaching methods:
Method A: 78, 85, 92, 65, 70, 88, 75, 82, 95, 80, 72, 68
Method B: 72, 68, 80, 75, 82, 79, 74, 85, 78, 90, 86, 83
Required
Perform manual two-sample K-S test (show all steps)
Test at α=0.05 significance level
Calculate the critical value for the test
Verify with R code
Write comprehensive analysis report
R Code Template for Assignments
# Assignment Question 1 Template
lifetimes <- c(85, 120, 65, 150, 95, 110, 78, 135, 88, 125, 102, 140)
# One-sample K-S test for exponential distribution with mean=100
ks_exp <- ks.test(lifetimes, "pexp", rate=1/100)
print(ks_exp)
##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: lifetimes
## D = 0.47795, p-value = 0.004841
## alternative hypothesis: two-sided
# Manual calculation
manual_ks_exp <- function(data, rate) {
n <- length(data)
sorted_data <- sort(data)
# Empirical CDF
F_n <- (1:n)/n
# Theoretical exponential CDF
F_exp <- pexp(sorted_data, rate=rate)
# Calculate differences
D_plus <- max(F_n - F_exp)
D_minus <- max(F_exp - c(0, F_n[-n]))
D <- max(D_plus, D_minus)
cat("Sample size:", n, "\n")
cat("D+ =", D_plus, "\n")
cat("D- =", D_minus, "\n")
cat("D =", D, "\n")
# Critical value approximation
critical_value <- 1.358 / sqrt(n)
cat("Critical value (α=0.05):", critical_value, "\n")
cat("Reject H0:", D > critical_value, "\n")
return(list(D = D, critical_value = critical_value))
}
result1 <- manual_ks_exp(lifetimes, 1/100)
## Sample size: 12
## D+ = 0.2231302
## D- = 0.4779542
## D = 0.4779542
## Critical value (α=0.05): 0.3920208
## Reject H0: TRUE
# Assignment Question 2 Template
method_A <- c(78, 85, 92, 65, 70, 88, 75, 82, 95, 80, 72, 68)
method_B <- c(72, 68, 80, 75, 82, 79, 74, 85, 78, 90, 86, 83)
# Two-sample K-S test
ks_two <- ks.test(method_A, method_B)
print(ks_two)
##
## Exact two-sample Kolmogorov-Smirnov test
##
## data: method_A and method_B
## D = 0.16667, p-value = 0.9923
## alternative hypothesis: two-sided
# Manual calculation with detailed output
detailed_ks_two_sample <- function(sample1, sample2, alpha=0.05) {
n1 <- length(sample1)
n2 <- length(sample2)
# Combine and sort all data
all_data <- sort(unique(c(sample1, sample2)))
# Calculate EDFs
F1 <- sapply(all_data, function(x) sum(sample1 <= x)/n1)
F2 <- sapply(all_data, function(x) sum(sample2 <= x)/n2)
differences <- abs(F1 - F2)
D <- max(differences)
max_index <- which.max(differences)
# Create detailed table
result_table <- data.frame(
x = all_data,
F1 = round(F1, 3),
F2 = round(F2, 3),
Difference = round(differences, 3)
)
cat("Detailed Comparison:\n")
print(result_table)
cat("\nMaximum difference at x =", all_data[max_index], "\n")
cat("Test statistic D =", D, "\n")
# Critical value
c_alpha <- sqrt(-0.5 * log(alpha/2))
critical_value <- c_alpha * sqrt((n1 + n2)/(n1 * n2))
cat("Critical value (α=", alpha, "):", critical_value, "\n")
cat("Reject H0:", D > critical_value, "\n")
return(list(D = D, critical_value = critical_value, table = result_table))
}
result2 <- detailed_ks_two_sample(method_A, method_B)
## Detailed Comparison:
## x F1 F2 Difference
## 1 65 0.083 0.000 0.083
## 2 68 0.167 0.083 0.083
## 3 70 0.250 0.083 0.167
## 4 72 0.333 0.167 0.167
## 5 74 0.333 0.250 0.083
## 6 75 0.417 0.333 0.083
## 7 78 0.500 0.417 0.083
## 8 79 0.500 0.500 0.000
## 9 80 0.583 0.583 0.000
## 10 82 0.667 0.667 0.000
## 11 83 0.667 0.750 0.083
## 12 85 0.750 0.833 0.083
## 13 86 0.750 0.917 0.167
## 14 88 0.833 0.917 0.083
## 15 90 0.833 1.000 0.167
## 16 92 0.917 1.000 0.083
## 17 95 1.000 1.000 0.000
##
## Maximum difference at x = 70
## Test statistic D = 0.1666667
## Critical value (α= 0.05 ): 0.5544426
## Reject H0: FALSE
# Visualization
plot(ecdf(method_A), col="blue", main="EDF Comparison: Method A vs B",
xlab="Test Scores", ylab="Cumulative Probability")
plot(ecdf(method_B), col="red", add=TRUE)
legend("bottomright", legend=c("Method A", "Method B"),
col=c("blue", "red"), lwd=2)
grid()
Advanced K-S Test Functions in R
# Comprehensive K-S analysis function
comprehensive_ks_analysis <- function(data1, data2 = NULL, dist = NULL, ...) {
cat("=== KOLMOGOROV-SMIRNOV TEST ANALYSIS ===\n\n")
if(is.null(data2)) {
# One-sample test
cat("ONE-SAMPLE K-S TEST\n")
cat("H0: Data follows", dist, "distribution\n")
cat("H1: Data does not follow", dist, "distribution\n\n")
result <- ks.test(data1, dist, ...)
print(result)
# Additional statistics
n <- length(data1)
cat("\nSample size:", n, "\n")
cat("Theoretical distribution:", dist, "\n")
} else {
# Two-sample test
cat("TWO-SAMPLE K-S TEST\n")
cat("H0: Both samples come from the same distribution\n")
cat("H1: Samples come from different distributions\n\n")
result <- ks.test(data1, data2)
print(result)
# Additional statistics
n1 <- length(data1)
n2 <- length(data2)
cat("\nSample sizes: n1 =", n1, ", n2 =", n2, "\n")
cat("Combined sample size:", n1 + n2, "\n")
}
return(result)
}
# Run analyses
comp_result1 <- comprehensive_ks_analysis(lifetimes, dist = "pexp", rate = 1/100)
## === KOLMOGOROV-SMIRNOV TEST ANALYSIS ===
##
## ONE-SAMPLE K-S TEST
## H0: Data follows pexp distribution
## H1: Data does not follow pexp distribution
##
##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: data1
## D = 0.47795, p-value = 0.004841
## alternative hypothesis: two-sided
##
##
## Sample size: 12
## Theoretical distribution: pexp
comp_result2 <- comprehensive_ks_analysis(method_A, method_B)
## === KOLMOGOROV-SMIRNOV TEST ANALYSIS ===
##
## TWO-SAMPLE K-S TEST
## H0: Both samples come from the same distribution
## H1: Samples come from different distributions
##
##
## Exact two-sample Kolmogorov-Smirnov test
##
## data: data1 and data2
## D = 0.16667, p-value = 0.9923
## alternative hypothesis: two-sided
##
##
## Sample sizes: n1 = 12 , n2 = 12
## Combined sample size: 24
11 Confidence Intervals for Quantiles
Let \(X\) be a random variable of the continuous type with pdf \(f(x)\) and cdf \(F(x)\).
Let \(p\) denote a positive proper fraction and assume that the equation \(F(x)=P\) \[ F(x)=P(X\leq x)=p \] has a unique solution for \(x\). this unique root is denoted by \(\varepsilon_P\) and is called the quantile (of the distribution) of order \(P\). Therefore, \[ P(X\leq \varepsilon_p)=F(\varepsilon_p)=p \]
The quantile of order \(\frac{1}{2}\) is the median of the distribution and \[ P(X\leq \varepsilon_0.05)=F(\varepsilon_0.05)=0.05 \]
To obtain a distribution free confidence interval for \(\varepsilon_p\), the quantile of order \(P\) of a distribution of the continuous type with distribution function \(F(x)\) take a random sample \(X-1, X_2, ... ,X_n\) of size \(n\) from that distribution.
Let \(Y_1<Y_2< ...<Y_n\) be the order statistics of the sample.
Take \(Y_i<Y_j\) and consider the event \(Y_i<\varepsilon_p<Y_j\) for the \(i^{th}\) order statistic to be less than \(\varepsilon_p\). It must be true that at least \(i^{th}\) of the \(X\) values are less than \(\varepsilon_p\). Moreover, for the \(j^{th}\) order statistic to be greater than \(\varepsilon_p\), fewer than \(j\) of the values are less than \(\varepsilon_p\).
\[ P\left[Y_i< \varepsilon_p <Y_j\right]=\sum_{r=1}^{j-1} \frac{n!}{r!(n-r)!}p^r(1-p)^{n-r} \]
is the probability of having at least \(i\) but less than \(j\) of then \(X_{i}^{'}s\) being less than \(\varepsilon_p\)
Suppose that \[ \alpha=P\left[Y_i< \varepsilon_p<Y_j\right] \]
then the probability is \(\alpha\) that the random interval \((Y_i,Y_j)\) includes the quantile of order \(p\).
If the experiment value of \(Y_i\) and \(Y_j\) are respectively \(y_i\) and \(y_j\), the interval \((Y_i,Y_j)\) serves as a \(100\alpha\) percent confidence interval (CI) for\(\varepsilon_p\), the quantile of order \(p\).
A confidence interval for a quantile is an interval estimate that, with a specified confidence level, contains the true population quantile. Unlike parametric confidence intervals for means, quantile CIs are distribution-free and based on order statistics.
Key Concepts:
Quantile: The value below which a given percentage of observations fall
Order statistics: The sorted sample values \[ X(1) \leq X(2) \leq \cdots \leq X(n) \]
Distribution-free: No assumptions about the underlying distribution
Exact coverage: Based on binomial probabilities
Common Quantiles:
Median: 50th percentile (q = 0.5)
Quartiles: 25th, 50th, 75th percentiles
Percentiles: Any value between 0 and 1
Methods:
Exact binomial method (most common)
Normal approximation (for large samples)
11.1 Solved Example
Let \(X-1, X_2, X_3 and X_4\) be a random sample of size \(4\) from a distribution of the continuous type.
Let \(Y_1<Y_2<Y_3<Y_4\) be the corresponding order statistics.
Required
Calculate \[P\left[Y_1< \varepsilon_{0.5}<Y_4\right]\]
Solution
We know that \[P\left[Y_i< \varepsilon_p<Y_j\right]=\sum_{r=1}^{j-1} \frac{n!}{r!(n-r)!}p^r(1-p)^{n-r}\]
\[P\left[Y_1< \varepsilon_{0.5} <Y_4\right]=\sum_{r=1}^{4-1} \frac{4!}{r!(4-r)!}0.5^r(1-0.5)^{4-r}=\]
\[P\left[Y_1< \varepsilon_{0.5} <Y_4\right]=\left(\frac{24}{1!3!}*0.5^4\right)+\left(\frac{24}{2!2!}*0.5^4\right)+\left(\frac{24}{3!1!}*0.5^4\right)\]
\[P\left[Y_1< \varepsilon_{0.5} <Y_4\right]=\frac{1}{4}+\frac{3}{8}+\frac{1}{4}=0.875\]
If \(Y_1\) and \(Y_4\) are observed to be \(Y_1=2.8\) and \(Y_4=4.2\) respectively then the interval \((2.8,4.2)\) is an 87.5 percent CI for the median \(\varepsilon_{0.5}\) of the distribution.
\[P\left[Y_i< \varepsilon_p<Y_j\right]=\sum_{r=1}^{j-1} \frac{n!}{r!(n-r)!}p^r(1-p)^{n-r}\]
For samples that are fairly large, we can use the normal approximation distribution to approximate to binomial probability.
Let \(n=27\) and we want a confidence interval for \(\varepsilon_p\), we calculate \[np=27*0.25=6.75\simeq 7\]
ie the \(7^{th}\) order statistics \(Y_7\) is a point estimation of \(\varepsilon_{0.25}\) We consider two order statistics one less than \(Y_7\) and another greater than \(Y_7\).
Let us take \(Y_4\) and \(Y_10\) we calculate the probability \(\alpha\) than the interval \((Y_4, Y_10)\) includes 0.25.
\[\alpha=P\left[Y_4< \varepsilon_p<Y_10\right]=\sum_{r=4}^{9} \frac{27!}{r!(27-r)!}0.25^r(0.75)^{27-r}\]
If \(R\) is the number of observation less than \(\varepsilon_p\), then
\[\alpha=P\left[Y_4<R<Y_10\right]=P\left[Y_3.5<R<Y_9.5\right]\]
But \[R ~ b(27,0.25)=b(27,\frac{1}{4})\]
\[E(R)=np=27*0.25=6.75=\frac{27}{4}\]
\[Var(R)=npq=27*0.75*0.25=\frac{81}{16}\]
\[\alpha=P\left[\frac{(3.5-6.75)}{\sqrt{\frac{81}{16}}}<Z<\frac{(9.5-6.75)}{\sqrt{\frac{81}{16}}}\right]=P\left[ \frac{-13}{9}<Z<\frac{11}{9}\right]\]
\[\alpha=\phi(\frac{11}{9})-\phi(\frac{-13}{9})=0.814\]
The appropriate order statistics \(Y_i\) and \(Y_j\) selected such that \(i\) and \(J\) are fairly symmetrically located about \(np\).
11.2 Solved Example : Manual Computation for Median
Construct a 90% confidence interval for the median from the following sample:
Data: 12, 15, 18, 20, 22, 25, 28, 30, 35, 40
Solution
Step 1: Sort the Data
Sorted: 12, 15, 18, 20, 22, 25, 28, 30, 35, 40
n = 10
Step 2: Determine Order Statistics
We need to find indices r and s such that:
\[ P\left( X_{(r)} \leq \xi_{0.5} \leq X_{(s)} \right) \geq 0.90 \]
Here:
- \(\xi_{0.5}\) is the population median.
- \(X_{(r)}\) and \(X_{(s)}\) are the \(r\)-th and \(s\)-th order statistics from a sample of size \(n\).
- The inequality represents a confidence interval for the median with at least 90% confidence.
Step 3: Use Binomial Probabilities
For the median (p=0.5), the number of observations ≤ ξ₀.₅ follows Binomial(n, 0.5).
We want
\[ P(r \leq k \leq s - 1) \geq 0.90 \]
where
\[ k \sim \text{Binomial}(n = 10, p = 0.5) \]
Step 4: Find Appropriate Indices
From binomial tables or calculations:
\(P(3 \leq k \leq 7)\)
\[ \begin{align*} P(k \leq 2) &= 0.0547 \\ P(k \leq 7) &= 0.9453 \\ P(3 \leq k \leq 7) &= P(k \leq 7) - P(k \leq 2) = 0.9453 - 0.0547 = 0.8906 \end{align*} \]
This yields a confidence level of approximately 89%.
Try:
\(P(2 \leq k \leq 8)\)
\[ \begin{align*} P(k \leq 1) &= 0.0107 \\ P(k \leq 8) &= 0.9893 \\ P(2 \leq k \leq 8) &= P(k \leq 8) - P(k \leq 1) = 0.9893 - 0.0107 = 0.9786>0.90 \end{align*} \]
This yields a confidence level of approximately 97.86%, which satisfies the requirement.
11.3 Conclusion
To achieve at least 90% confidence that the population median lies within the interval, we select:
\[ r = 2,\quad s = 9 \]
Thus, the 90% distribution-free confidence interval for the median is:
\[ [X_{(2)}, X_{(9)}]=[15,35] \] Step 6: Verify Coverage
The interval [15, 35] contains the true median with approximately 97.86% confidence.
11.4 Solved Example : Manual Computation for 75th Percentile
Construct a 95% confidence interval for the 75th percentile from:
Data: 8, 12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38
Solution
Step 1: Sort the Data
Sorted: 8, 12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38
n = 12
Step 2: Determine Order Statistics
For q = 0.75 (75th percentile), we use binomial distribution with p = 0.75.
We want indices r and s such that:
\[ P\left( X_{(r)} \leq \xi_{0.75} \leq X_{(s)} \right) \geq 0.95) \]
where
\[ k \sim \text{Binomial}(n = 12, p = 0.75) \]
\[ P(r \leq k \leq s - 1)\geq 0.95) \]
Calculate cumulative probabilities:
P(k ≤ 6) = 0.0544
P(k ≤ 7) = 0.1576
P(k ≤ 8) = 0.3512
P(k ≤ 9) = 0.6488
P(k ≤ 10) = 0.8816
P(k ≤ 11) = 0.9844
Find interval with probability ≥ 0.95:
\[P(7 ≤ k ≤ 11) = P(k ≤ 11) - P(k ≤ 6) = 0.9844 - 0.0544 = 0.93 \] This is too low
\[P(6 ≤ k ≤ 11) = P(k ≤ 11) - P(k ≤ 5) = 0.9844 - 0.0142 = 0.9702 ≥ 0.95\]
Therefore, use r = 6, s = 12
Step 4: Construct Confidence Interval
95% CI for 75th percentile = \[[X_{(6)}, X_{(12)}] = [22, 38]\]
Step 5: Verify Coverage
The interval [22, 38] contains the true 75th percentile with approximately 97.02% confidence.
R Code Implementation
# Example 1: Median CI
data1 <- c(12, 15, 18, 20, 22, 25, 28, 30, 35, 40)
# Manual calculation function
quantile_ci <- function(data, q = 0.5, conf.level = 0.90) {
n <- length(data)
sorted_data <- sort(data)
alpha <- 1 - conf.level
# Find indices using binomial distribution
prob_lower <- 0
r <- 0
s <- 0
# Try different intervals to find one with coverage >= conf.level
for(i in 1:n) {
for(j in i:n) {
coverage <- pbinom(j-1, n, q) - pbinom(i-1, n, q)
if(coverage >= conf.level) {
if(prob_lower == 0 || (j-i) < (s-r)) {
r <- i
s <- j
prob_lower <- coverage
}
}
}
}
ci_lower <- sorted_data[r]
ci_upper <- sorted_data[s]
cat("Sample size: n =", n, "\n")
cat("Quantile: q =", q, "\n")
cat("Confidence level:", conf.level, "\n")
cat("Indices: r =", r, ", s =", s, "\n")
cat("Confidence interval: [", ci_lower, ", ", ci_upper, "]\n")
cat("Exact coverage probability:", prob_lower, "\n")
return(list(ci = c(ci_lower, ci_upper), indices = c(r, s),
coverage = prob_lower))
}
# Calculate for Example 1
result1 <- quantile_ci(data1, q = 0.5, conf.level = 0.90)
## Sample size: n = 10
## Quantile: q = 0.5
## Confidence level: 0.9
## Indices: r = 2 , s = 8
## Confidence interval: [ 15 , 30 ]
## Exact coverage probability: 0.9345703
# Example 2: 75th percentile CI
data2 <- c(8, 12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38)
result2 <- quantile_ci(data2, q = 0.75, conf.level = 0.95)
## Sample size: n = 12
## Quantile: q = 0.75
## Confidence level: 0.95
## Indices: r = 6 , s = 12
## Confidence interval: [ 22 , 38 ]
## Exact coverage probability: 0.9540709
# Using boot package for comparison
if(require(boot)) {
# Bootstrap method
quantile_boot <- function(data, indices, q) {
return(quantile(data[indices], q))
}
boot_result1 <- boot(data1, statistic = quantile_boot, R = 1000, q = 0.5)
boot_ci1 <- boot.ci(boot_result1, conf = 0.90, type = "perc")
cat("\nBootstrap 90% CI for median:\n")
print(boot_ci1)
boot_result2 <- boot(data2, statistic = quantile_boot, R = 1000, q = 0.75)
boot_ci2 <- boot.ci(boot_result2, conf = 0.95, type = "perc")
cat("\nBootstrap 95% CI for 75th percentile:\n")
print(boot_ci2)
}
## Loading required package: boot
##
## Bootstrap 90% CI for median:
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot_result1, conf = 0.9, type = "perc")
##
## Intervals :
## Level Percentile
## 90% (18, 30 )
## Calculations and Intervals on Original Scale
##
## Bootstrap 95% CI for 75th percentile:
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = boot_result2, conf = 0.95, type = "perc")
##
## Intervals :
## Level Percentile
## 95% (22.00, 35.75 )
## Calculations and Intervals on Original Scale
11.5 Exercise & Assignment
- Let the order statistics of 27 observation in a random sample from a certain distribution of the continuous type be represented by the numbers
61 69 71 74 79 80 84 86 87 82 93 96 100 104 105 113 121 122 129 141 143 156 164 191 217 276
Required
Find a point estimate for \[π_{0.25}\]
Compute P(Y4< π0.25 <Y10)where π0.25 is the 25th percentile
- Let \[ Y1<Y2<…<Y20 \] denote the order statistics of a random sample of size 20 from a distribution of a continuous type.
Required: Compute \[P(Y_{12}<π_{0.7}<Y_{15})\]
Find the smallest value of n for which \[ P(Y_1<π_{0.25}<Y_n )≥0.99 \] Where \(Y_1<Y_2<⋯<Y_n\) are the order statistics of a random sample of size n from a continuous distribution
Find the smallest value of n for which \[ P(Y_1<π_{0.25}<Y_n )≥0.95 \] Where \(Y_1<Y_2<⋯<Y_n\) are the order statistics of a random sample of size n from a continuous distribution
A researcher collected reaction times (ms) from 15 participants:
Data: 245, 289, 267, 312, 278, 295, 301, 256, 324, 288, 273, 299, 315, 282, 307
Required:
Construct a 95% confidence interval for the median manually
Construct a 90% confidence interval for the 25th percentile manually
Verify both intervals using R
Calculate exact coverage probabilities
Interpret the results
- A survey collected annual incomes (in $1000) from 20 households:
Data: 45, 52, 38, 60, 47, 55, 42, 51, 49, 53, 58, 62, 55, 65, 60, 57, 63, 59, 61, 58
Required:
Construct 95% CIs for the 25th, 50th, and 75th percentiles manually
Compare the widths of these intervals
Use bootstrap method to construct 95% CIs for the same quantiles
Compare exact binomial method vs bootstrap results
Write a comprehensive analysis report
R Code Template for Assignments
# Assignment Question 1 Template
reaction_times <- c(245, 289, 267, 312, 278, 295, 301, 256, 324, 288, 273, 299, 315, 282, 307)
# Enhanced quantile CI function
enhanced_quantile_ci <- function(data, q = 0.5, conf.level = 0.95) {
n <- length(data)
sorted_data <- sort(data)
alpha <- 1 - conf.level
cat("=== QUANTILE CONFIDENCE INTERVAL ===\n")
cat("Sample size: n =", n, "\n")
cat("Quantile: q =", q, "\n")
cat("Confidence level:", conf.level, "\n\n")
# Method 1: Exact binomial method
cat("1. EXACT BINOMIAL METHOD:\n")
best_interval <- NULL
best_coverage <- 0
best_width <- Inf
for(i in 1:n) {
for(j in i:n) {
coverage <- pbinom(j-1, n, q) - pbinom(i-1, n, q)
if(coverage >= conf.level) {
width <- sorted_data[j] - sorted_data[i]
if(is.null(best_interval) || width < best_width) {
best_interval <- c(i, j)
best_coverage <- coverage
best_width <- width
}
}
}
}
if(is.null(best_interval)) {
cat("No interval found with required coverage.\n")
return(NULL)
}
r <- best_interval[1]
s <- best_interval[2]
ci_lower <- sorted_data[r]
ci_upper <- sorted_data[s]
cat("Indices: r =", r, ", s =", s, "\n")
cat("Order statistics: X_(", r, ") =", ci_lower, ", X_(", s, ") =", ci_upper, "\n")
cat("Confidence interval: [", ci_lower, ", ", ci_upper, "]\n")
cat("Interval width:", best_width, "\n")
cat("Exact coverage probability:", best_coverage, "\n\n")
# Method 2: Normal approximation for large samples
if(n > 20) {
cat("2. NORMAL APPROXIMATION:\n")
z <- qnorm(1 - alpha/2)
r_approx <- floor(n * q - z * sqrt(n * q * (1 - q)))
s_approx <- ceiling(n * q + z * sqrt(n * q * (1 - q)) + 1)
r_approx <- max(1, min(r_approx, n))
s_approx <- max(1, min(s_approx, n))
ci_lower_approx <- sorted_data[r_approx]
ci_upper_approx <- sorted_data[s_approx]
cat("Approximate indices: r =", r_approx, ", s =", s_approx, "\n")
cat("Approximate CI: [", ci_lower_approx, ", ", ci_upper_approx, "]\n\n")
}
return(list(
exact_ci = c(ci_lower, ci_upper),
exact_indices = c(r, s),
exact_coverage = best_coverage,
exact_width = best_width
))
}
# For median (50th percentile)
result_median <- enhanced_quantile_ci(reaction_times, q = 0.5, conf.level = 0.95)
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 15
## Quantile: q = 0.5
## Confidence level: 0.95
##
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 4 , s = 12
## Order statistics: X_( 4 ) = 273 , X_( 12 ) = 307
## Confidence interval: [ 273 , 307 ]
## Interval width: 34
## Exact coverage probability: 0.9648438
# For 25th percentile
result_q25 <- enhanced_quantile_ci(reaction_times, q = 0.25, conf.level = 0.90)
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 15
## Quantile: q = 0.25
## Confidence level: 0.9
##
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 2 , s = 8
## Order statistics: X_( 2 ) = 256 , X_( 8 ) = 289
## Confidence interval: [ 256 , 289 ]
## Interval width: 33
## Exact coverage probability: 0.9025194
# Assignment Question 2 Template
incomes <- c(45, 52, 38, 60, 47, 55, 42, 51, 49, 53, 58, 62, 55, 65, 60, 57, 63, 59, 61, 58)
# Multiple quantiles analysis
multiple_quantile_cis <- function(data, quantiles = c(0.25, 0.5, 0.75), conf.level = 0.95) {
results <- list()
cat("MULTIPLE QUANTILE CONFIDENCE INTERVALS\n")
cat("=====================================\n\n")
for(q in quantiles) {
cat("Quantile: ", q*100, "th percentile\n", sep = "")
result <- enhanced_quantile_ci(data, q = q, conf.level = conf.level)
results[[paste0("q", q*100)]] <- result
cat("\n")
}
return(results)
}
# Analyze all three quartiles
quartile_results <- multiple_quantile_cis(incomes)
## MULTIPLE QUANTILE CONFIDENCE INTERVALS
## =====================================
##
## Quantile: 25th percentile
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 20
## Quantile: q = 0.25
## Confidence level: 0.95
##
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 2 , s = 10
## Order statistics: X_( 2 ) = 42 , X_( 10 ) = 55
## Confidence interval: [ 42 , 55 ]
## Interval width: 13
## Exact coverage probability: 0.961823
##
##
## Quantile: 50th percentile
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 20
## Quantile: q = 0.5
## Confidence level: 0.95
##
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 6 , s = 15
## Order statistics: X_( 6 ) = 51 , X_( 15 ) = 60
## Confidence interval: [ 51 , 60 ]
## Interval width: 9
## Exact coverage probability: 0.9586105
##
##
## Quantile: 75th percentile
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 20
## Quantile: q = 0.75
## Confidence level: 0.95
##
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 11 , s = 19
## Order statistics: X_( 11 ) = 57 , X_( 19 ) = 63
## Confidence interval: [ 57 , 63 ]
## Interval width: 6
## Exact coverage probability: 0.961823
# Bootstrap comparison
if(require(boot)) {
cat("BOOTSTRAP COMPARISON\n")
cat("===================\n\n")
for(q in c(0.25, 0.5, 0.75)) {
quantile_boot <- function(data, indices, q) {
return(quantile(data[indices], q))
}
boot_result <- boot(incomes, statistic = quantile_boot, R = 2000, q = q)
boot_ci <- boot.ci(boot_result, conf = 0.95, type = "perc")
cat(q*100, "th percentile bootstrap CI:\n", sep = "")
cat("Lower:", boot_ci$percent[4], "Upper:", boot_ci$percent[5], "\n\n")
}
}
## BOOTSTRAP COMPARISON
## ===================
##
## 25th percentile bootstrap CI:
## Lower: 44.25 Upper: 55
##
## 50th percentile bootstrap CI:
## Lower: 51.5 Upper: 59.5
##
## 75th percentile bootstrap CI:
## Lower: 57 Upper: 62.25
# Visualization
plot_quantile_cis <- function(data, quantile_results) {
sorted_data <- sort(data)
n <- length(data)
plot(1:n, sorted_data, type = "n",
xlab = "Order Statistics", ylab = "Data Values",
main = "Quantile Confidence Intervals")
points(1:n, sorted_data, pch = 19, col = "blue")
colors <- c("red", "green", "purple")
quantiles <- c(0.25, 0.5, 0.75)
for(i in 1:length(quantiles)) {
q <- quantiles[i]
result <- quantile_results[[paste0("q", q*100)]]
if(!is.null(result)) {
r <- result$exact_indices[1]
s <- result$exact_indices[2]
segments(r, sorted_data[r], s, sorted_data[s],
col = colors[i], lwd = 3)
points(c(r, s), sorted_data[c(r, s)],
col = colors[i], pch = 17, cex = 1.5)
}
}
legend("topleft",
legend = c("Data", "Q1 CI", "Median CI", "Q3 CI"),
col = c("blue", "red", "green", "purple"),
pch = c(19, 17, 17, 17), lwd = c(NA, 3, 3, 3))
}
plot_quantile_cis(incomes, quartile_results)