MATH 454: NON-PARAMETRIC METHODS

1 Introduction to Non-parametric Methods

In the theory of Estimation and testing of hypothesis, we assume that observations which are available to the statistician come from distribution for which the exact form is known even though the values of some parameters are unknown.

In many of the problems to be discussed in non-parametric method we shall not assume that the available observations come from a particular parametric family or distribution. We study inferences that can be made about the distribution from which the observations come, without making special assumption about the form of that distribution.

Non-parametric statistics are methods that make few or no assumptions about the underlying population distribution from which the data are sampled (e.g., no assumption of a normal distribution).

They are “distribution-free” methods.

1.1 Key Characteristics

Flexibility: They work with data that isn’t normally distributed, is ordinal (ranked), or has outliers.

Focus on Ranks: Many non-parametric tests work by ranking the data rather than using the raw data values themselves.

Robustness: They are less affected by violations of assumptions that would invalidate parametric tests.

1.2 When to Use Them

  • When your data is not normally distributed.

  • When you have a small sample size.

  • When your data is on an ordinal scale (e.g., rankings, Likert scales).

  • When your data contains significant outliers.

1.3 Overview

Statistical tests can be broadly classified into parametric and non-parametric tests. Parametric tests assume underlying statistical distributions (e.g., normality), while non-parametric tests make fewer assumptions and are often used when data doesn’t meet parametric criteria.

1.4 Comparison Table

Table 1.1: Common Parametric Tests and Their Non-Parametric Equivalents
Parametric.Test..Makes.Assumptions. Non.Parametric.Equivalent..Fewer.Assumptions.
Independent t-test (compares 2 means) Mann-Whitney U Test (compares 2 medians)
Paired t-test (compares means from same group) Wilcoxon Signed-Rank Test
One-Way ANOVA (compares 3+ means) Kruskal-Wallis H Test
Pearson’s Correlation (linear relationship) Spearman’s Rank Correlation

1.5 Definition of key concept in Non-parametric methods

1. Distribution-Free

The core idea. Methods do not assume the data follows a specific probability distribution (like the normal distribution). This makes them more flexible for real-world data.

2. Ranks

A fundamental data transformation. Instead of using the raw data values, observations are sorted and replaced by their position (rank). This minimizes the effect of outliers and non-normal shapes.

3. Ordinal Data

The type of data these methods often use. Data that can be ordered or ranked (e.g., 1st, 2nd, 3rd; “strongly agree” to “strongly disagree”), but the intervals between points are not necessarily equal.

4. Robustness

A key advantage. Non-parametric methods are less influenced by outliers or violations of common statistical assumptions. They provide reliable results even when the data is “messy.”

5. Median (not Mean)

The typical measure of central tendency. Since they don’t assume a symmetric distribution, non-parametric methods usually test hypotheses about the median rather than the mean.

2 The Sign Test

The Sign Test is a non-parametric hypothesis test that evaluates the median of a population. It uses only the signs (positive or negative) of differences between observations and a hypothesized median, ignoring the magnitude of those differences.

2.1 Key Characteristics

  • Simple: Considers only whether values are above or below a hypothesized median.
  • Robust: Not affected by outliers or non-normal distributions.
  • Uses: Applicable to one-sample or paired data.
  • Test Statistic: Based on the binomial distribution of positive/negative signs.

For the purpose of describing Sign Test we assume that the p.d.f f(x) is a continuous density and assume that the medium denoted by \(m\) is uniquely defined by

\[ \int_{m}^{\infty} f(x) dx=\frac{1}{2} \]

Let \(X_1 , X_2 , ... ,X_n\) be a random sample of \(x\) and consider testing the hypothesis

\(H_0 : m=m_0\) against \(H_1 : m>m_0\).

From the definition of the median, it follows that when Ho is true;

\[P(X \geq m_0)=P(X-m_0 \geq 0)=\frac{1}{2}\]

and therefore that \[P(X-m_0 \geq 0)=\frac{1}{2} ; i=1,2,3,...,n\]

Let \[ Z_i = \begin{cases} 1, & \text{if } X_i - m_0 \ge 0 \\ 0, & \text{if } X_i - m_0 < 0 \end{cases} \]

We note that \(Z\) has a Bernoulli distribution with \(P=\frac{1}{2}\) when \(H_0\) is true. Since the \(Z_i\) are independent , the sum \[U=\sum_{i=1}^{n} Z_i\]

Will be a binomial random variable corresponding to n independent Bernoulli trial for which $ P=$, when Ho: is true.

Under \(H_1\)

Most \(X_i\) will tend to be larger than Mo and therefore the variable will tend to exceed the value to be expected when Ho is true. As a result, the right tail of the binomial distribution should be chosen as the critical region of the test. When \(H_0\) is true;

Let

\[ Z_i = \begin{cases} 1, & \text{with probability } \frac{1}{2} \\ 0, & \text{with probability } \frac{1}{2} \end{cases} \]

\[ E(Z_i)=(0*\frac{1}{2})+(1*\frac{1}{2})=(\frac{1}{2})\] \[ E(Z_{i}^{2})=(0^2*\frac{1}{2})+(1^2*\frac{1}{2})=(\frac{1}{2})\] \[ U=Z_1+Z_2+...+Z_n\] \[ E(U)=E(Z_1)+E(Z_2)+...+E(Z_n)=\frac{1}{2}+\frac{1}{2}+...+\frac{1}{2}=\frac{n}{2}\] \[ Var(U)=Var(Z_1)+Var(Z_2)+...+Var(Z_n)\] \[ Var(Z_i)=E(Z_1^2)-[E(Z_i^2)]^2=\frac{1}{2}-\frac{1}{4}=\frac{1}{4})\]

Therefore

\[Var(U)=\frac{1}{4}+\frac{1}{4}+...+\frac{1}{4}=\frac{n}{4}\]

that is, \(E(U)=\frac{n}{2}\) and \(Var(Z_i)=\frac{n}{4})\) when \(H_0\) is true.

For very small values of n, it is necessary to calculate the right probability until a total probability of approximately \(\alpha\) has been obtained to obtain the critical region for the test.

For \(P=\frac{1}{2}\) the binomial distribution is approximated well by its normal approximation for fairly small values of \(n\). Therefore, it usually suffices to use the normal approximation on these problems. We use the standard normal variable.

For a right tail critical region, we have;

\[Z=\frac{U-\frac{1}{2}-\frac{n}{2}}{\sqrt{\frac{n}{4}}}\]

For a left tail critical region, we have; \[Z=\frac{U+\frac{1}{2}-\frac{n}{2}}{\sqrt{\frac{n}{4}}}\]

2.2 Solved Example

The following data was obtained from testing the breaking strength of a ceramic tile manufactured through a new cheaper process;

20 42 18 21 19 26 21 20 22 35 22 24 18 20 32

Suppose that experience with the old process produced a median of 25, test the hypothesis at \(\alpha=0.05\). \(H_0 : m=25\) against \(H_1 : m<25\).

Solution

In testing \(H_0 : m=m_0\) against \(H_1 : m<m_0\)

We use the test statistic

\[Z=\frac{U+\frac{1}{2}-\frac{n}{2}}{\sqrt{\frac{n}{4}}}\] rejection region is the left tail. and reject for small value of \(Z\) where \[ U=Z_1+Z_2+...+Z_n\] \[X_1-m_0,X_2-m_0, ...., X_n-m_0\]

Subtracting 25 from each listed value we have;

-5 17 -7 -4 -6 1 -4 -5 -3 10 -3 -1 -7 -5 7

Corresponding Z values are;

0 1 0 0 0 1 0 0 0 1 0 0 0 0 1

\[U=\sum_{i=1}^{n} Z_i =4\] \(U\) is binomial distribution with \(n=15\) and \(P=\frac{1}{2}\) when \(H_0\) is true.

\[Z=\frac{4+\frac{1}{2}-\frac{15}{2}}{\sqrt{\frac{15}{4}}}=-1.55\]

From the normal distribution table we have \(P(Z\leq -1.65)=0.05\)

The critical value \(Z_{critical}=-1.65\)

Conclusion

Since -1.55>-1.65, We fail to reject \(H_0\) and conclude that the median of the new cheaper process is less than 25 at 0.05 significance level.

Remarks

Instead of getting the values of \(X_i\) in the above example, we can record the signs of the numbers. \(X_i-25\) rather than the \(X_i\) values in which case we would get

- + - - - + - - - + - - - - +

The number of positive’s is 4. then the total number of positive sign gives the value of \(U=4\). It is for this reason that the test is called the Sign test

2.3 Solved Example

The following data in tonnes are the amounts of sulphur oxides emitted by a large industrial plant in 40 days.

17 15 20 29 19 18 22 25 27 9
24 20 17 6 24 14 15 23 24 26
19 23 28 19 16 22 24 17 20 13
19 10 23 18 31 13 20 17 24 14

Test the hypothesis at \(\alpha=0.01\). \(H_0 : m=21.5\) against \(H_1 : m<21.5\).

Solution

In testing \(H_0 : m=21.5\) against \(H_1 : m<21.5\)

We use the test statistic

\[Z=\frac{U+\frac{1}{2}-\frac{n}{2}}{\sqrt{\frac{n}{4}}}\] rejection region is the left tail. and reject for small value of \(Z\) where \[U=Z_1+Z_2+...+Z_n\] value of \(Z_i\) are as follows where \(i=1,2,...,n\)

- - - + - - + + + -
+ - - - + - - + + +
- + + - - + + - - -
- - + - + - - - + -

The number of positive signs is 16; \(U=\sum Z_i=16\)

\(U\) is binomial with \(n=40\) and \(P=\frac{1}{2}\) when \(H_0\) is true.

The test statistic

\[Z=\frac{16+\frac{1}{2}-\frac{40}{2}}{\sqrt{\frac{40}{4}}}=-1.0107\]

The Table value from normal distribution table(critical value =-2.33)

Conclusion

Since -1.107>-2.33, we fail to reject \(H_0\) and conclude that the Median(m) is less than 21.5 at 0.01 significance level.

To manually compute the One-Sample Sign Test for your example, here’s how you can walk through the steps without using R:

2.4 Solved Example: One-Sample Sign Test

Test if the median exam score is greater than 75 using α = 0.05

Data: 78, 82, 65, 90, 72, 88, 79, 85, 95, 70
Hypothesized Median (H₀): 75
Significance Level (α): 0.05
Alternative Hypothesis (H₁): Median > 75 (one-tailed)

Solution

  1. Subtract Hypothesized Median from Each Value
Score Difference (Score - 75) Sign
78 +3 +
82 +7 +
65 -10
90 +15 +
72 -3
88 +13 +
79 +4 +
85 +10 +
95 +20 +
70 -5
  1. Count Signs
  • Positive signs (+): 7

  • Negative signs (–): 3

  • Zero differences: 0

  • Total non-zero differences (n): 10

  1. Apply Binomial Test

Under H₀, the probability of a positive sign is 0.5.
We test whether the number of positive signs (7 out of 10) is significantly greater than expected.

Use the binomial formula or a calculator to find:

p-value = P(X ≥ 7) where X ~ Binomial(n = 10, p = 0.5)

\[P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}\]

So,

\[P(X \geq 7) = \sum_{k=7}^{10} \binom{10}{k}(0.5)^{10}=0.1719\]

  1. Decision

Since p-value = 0.1719 > α = 0.05,
Fail to reject H₀: There is insufficient evidence to conclude that the median is greater than 75.

2.5 Solved Example: One-Sample Sign Test

Test if the median exam score is greater than 75 using α = 0.05

Data: 78, 82, 74, 79, 81, 76, 83, 77, 80, 72

Step-by-Step Analysis

# Data
scores <- c(78, 82, 74, 79, 81, 76, 83, 77, 80, 72)
hyp_median <- 75

# Step 2: Calculate signs
signs <- scores - hyp_median
signs_nonzero <- signs[signs != 0]
positive_signs <- sum(signs_nonzero > 0)
negative_signs <- sum(signs_nonzero < 0)
n <- length(signs_nonzero)

# Step 3: Binomial test (one-tailed)
binom_result <- binom.test(positive_signs, n, p = 0.5, alternative = "greater")

# Output results
cat("Positive signs:", positive_signs, "\n")
## Positive signs: 8
cat("Negative signs:", negative_signs, "\n")
## Negative signs: 2
cat("Total non-zero differences:", n, "\n")
## Total non-zero differences: 10
cat("p-value:", binom_result$p.value, "\n")
## p-value: 0.0546875
# Step 5: Decision
if (binom_result$p.value < 0.05) {
  cat("Reject H₀: There is sufficient evidence that the median > 75.\n")
} else {
  cat("Fail to reject H₀: Insufficient evidence that the median > 75.\n")
}
## Fail to reject H₀: Insufficient evidence that the median > 75.

2.6 Exercise & Assignment

  1. Test if median reaction time is less than 300ms (α = 0.05)

Data: 280, 295, 310, 275, 290, 285, 305, 298, 282, 315

  1. A quality control manager wants to test if the median weight of packaged products equals 500g. She randomly selects 15 packages:

Weights (g): 495, 502, 498, 505, 496, 503, 499, 501, 497, 504, 500, 506, 494, 507, 493

Required:

  • Test using both Sign Test and Signed-Rank Test

  • Apply normal approximation for the Signed-Rank Test

  • Calculate 95% confidence interval for the median using both methods

  • Compare the power of both tests

  • Write a comprehensive statistical report with recommendations

3 The Wilcoxon Signed-Rank Test

As we have seen, the signed test is very easy to perform but since we utilize only the signs of the difference between the observations and Mo in the one sample case, it tends to be wasteful of information, an alternative- non parametric test, the Wilcoxon signed-rank test is less wasteful in that it takes into account the magnitude of the differences. In this test, we rank the differences without regard to their signs; assigning rank 1 to the smallest difference in magnitude in absolute value; rank 2 to the second smallest difference in absolute value…….. and rank n to the largest difference in absolute value. Zero differences are discarded. If the absolute value of two or more differences are the same, we assign each one the mean of the ranks which they jointly occupy. The signed-rank test is based on;

\(T^+\) is the sum of the ranks assigned to the positive differences.

\(T^-\) is the sum of the ranks assigned to the negative differences.

\(T^+\) & \(T^-\) or \(T\)= min \((T^+, T^-)\)

We note that \(T^-\) and \(T^+\) takes value in the interval 0 to \(\frac{n(n+1)}{2}\)

\(T^-\) and \(T^+\) are symmetric about \(\frac{n(n+1)}{4}\)

The critical region for various test is given below.

Alternative Hypothesis Critical/Rejection Region
\(m \pm m_0\) T ≤ Tα
\(m > m_0\) T⁻ ≤ T₂α
\(m < m_0\) T⁺ ≤ T₂α

The Wilcoxon Signed-Rank Test is a non-parametric statistical test used to compare two related samples (paired data) or to test whether the median of a single sample differs from a specified value. It considers both the direction and magnitude of differences, making it more powerful than the Sign Test.

3.1 Key Features:

  • More powerful than Sign Test (uses rank information)

  • Uses both direction and magnitude of differences

  • Tests median differences for paired data or single sample

  • Assumes symmetric distribution of differences

3.2 Solved Example

The following are 15 measurement of the octane rating of a certain kind of gasoline.

97.5 95.2 97.3 96.0 98.2
93.2 99.1 96.1 97.0 100.3
97.4 94.5 95.3 96.8 98.5

Required

Use the signed-rank test at 0.05 level of significance to test whether or not the mean octane rating of the given and gasoline is 98.5.

Solution

In testing \(H_0 : \mu=98.5\) against \(H_1 : \mu<98.5\)

We reject \(H_0\) if \(T\leq T_\alpha =T_0.05\) for the appropriate value of n.

We subtract 98.5 from every value and rank the absolute difference

Measurement Differences Absolute Differences Rank
97.5 -1.0 1.0 3
95.2 -3.3 3.3 12
97.3 -1.2 1.2 5
96.0 -2.5 2.5 10
98.2 -0.3 0.3 1
93.2 -5.3 5.3 14
99.1 0.6 0.6 2
96.1 -2.4 2.4 9
97.0 -1.5 1.5 6
100.3 1.8 1.8 8
97.4 -1.1 1.1 4
94.5 -4.0 4.0 13
95.3 -3.2 3.2 11
96.8 -1.7 1.7 7
98.5 0.0 0.0 -

The value of 98.8 with zero difference is dropped.

\(T^+=1+3+4+5+6+7+9+10+11+12+13+14=95\)

\(T^-2=+8=10\)

\(T\)= min \((T^+, T^-)\)=min \((95, 10)=10\)

For n=14, we find that \(T_{0.05}=21\)

Conclusion

Since \(10<21\), we reject the \(H_0\).

3.3 Solved Example: One-Sample Signed-Rank Test

Test if the median battery life differs from 100 hours (α = 0.05)

Data: 95, 105, 98, 112, 88, 102, 108, 92, 115, 96

Solution

Step 1: State Hypotheses

H₀: Median = 100 hours

H₁: Median ≠ 100 hours (two-tailed)

Step 2: Calculate Differences from 100

Battery Life Difference (Xᵢ - 100) Absolute Difference Rank of Diff
95 -5 5 4 -4
105 +5 5 4 +4
98 -2 2 1.5 -1.5
112 +12 12 9 +9
88 -12 12 9 -9
102 +2 2 1.5 +1.5
108 +8 8 7 +7
92 -8 8 7 -7
115 +15 15 10 +10
96 -4 4 3 -3

Step 3: Handle Ties in Absolute Differences

  • |2| appears twice → average ranks (1+2)/2 = 1.5

  • |5| appears twice → average ranks (3+4)/2 = 3.5

  • |8| appears twice → average ranks (6+7)/2 = 6.5

  • |12| appears twice → average ranks (8+9)/2 = 8.5

Step 4: Calculate Rank Sums

W⁺ (sum of positive ranks) = 4 + 9 + 1.5 + 7 + 10 = 31.5

W⁻ (sum of negative ranks) = 4 + 1.5 + 9 + 7 + 3 = 24.5

Step 5: Test Statistic

W = min(W⁺, W⁻) = min(31.5, 24.5) = 24.5

Step 6: Critical Value (n = 10, α = 0.05, two-tailed)

From Wilcoxon table: Critical value = 8

Step 7: Decision Rule

Reject H₀ if W ≤ critical value

24.5 > 8 ⇒ Fail to reject H₀

Conclusion: No evidence that median battery life differs from 100 hours.

3.4 Example 2: Larger Sample with Normal Approximation

Test if median response time is less than 50ms (α = 0.05)

Data: 15 response times: 45, 52, 38, 55, 48, 42, 57, 39, 51, 44, 49, 41, 53, 47, 46

Solution

Step 1: State Hypotheses

H₀: Median = 50 ms

H₁: Median < 50 ms (one-tailed)

Step 2: Calculate Differences and Ranks

Time Diff Absolute Value Rank Signed Rank
45 -5 5 9 -9
52 +2 2 2.5 +2.5
38 -12 12 15 -15
55 +5 5 9 +9
48 -2 2 2.5 -2.5
42 -8 8 12 -12
57 +7 7 11 +11
39 -11 11 14 -14
51 +1 1 1 +1
44 -6 6 10 -10
49 -1 1 1 -1
41 -9 9 13 -13
53 +3 3 4 +4
47 -3 3 4 -4
46 -4 4 6.5 -6.5

Step 3: Handle Ties

  • |1| appears twice → rank (1+2)/2 = 1.5

  • |2| appears twice → rank (3+4)/2 = 3.5

  • |3| appears twice → rank (5+6)/2 = 5.5

  • |4| appears twice → rank (7+8)/2 = 7.5

  • |5| appears twice → rank (9+10)/2 = 9.5

Step 4: Calculate Rank Sums

W⁺ = 2.5 + 9 + 11 + 1 + 4 = 27.5

W⁻ = 9 + 15 + 2.5 + 12 + 14 + 10 + 1 + 13 + 4 + 6.5 = 87

W = W⁺ = 27.5 (testing for values less than 50)

Step 5: Normal Approximation (n = 15 > 10)

For n > 10, we can use normal approximation:

Mean: μ = n(n+1)/4 = 15×16/4 = 60

Variance: σ² = n(n+1)(2n+1)/24 = 15×16×31/24 = 310

σ = √310 = 17.61

Z = (W - μ)/σ = (27.5 - 60)/17.61 = -32.5/17.61 = -1.85

Step 6: Critical Value and Decision

One-tailed critical Z-value at α = 0.05: -1.645

-1.85 < -1.645 ⇒ Reject H₀

Conclusion: Significant evidence that median response time is less than 50ms.

Remarks

Handling Ties: Average ranks for tied absolute differences

Zero Differences: Discard from analysis (reduce effective n)

Small Samples (n ≤ 10): Use exact critical values from tables

Large Samples (n > 10): Use normal approximation

One-tailed vs Two-tailed: Choose W⁺ or W⁻ based on alternative hypothesis

The Signed-Rank Test provides robust median testing while being more powerful than the Sign Test.

3.5 Exercise & Assignments

A fitness trainer wants to test if a new workout program increases flexibility. She measures the maximum stretch (in cm) of 8 participants before and after 4 weeks of training:

Participant Before After
1 45 48
2 52 55
3 38 42
4 60 58
5 47 52
6 55 60
7 42 45
8 51 54

Require:

  • Perform the Sign Test manually and draw conclusion

  • Perform the Wilcoxon Signed-Rank Test manually and draw conclusion

  • Compare the results and explain why they might differ

  • Verify both tests using R

  • Which test is more appropriate here and why?

  • Calculate the effect size for the Signed-Rank Test

4 Paired Test with Small Sample

The Paired Signed-Rank Test (Wilcoxon Signed-Rank Test for paired samples) is a non-parametric statistical test used to compare two related samples or repeated measurements on a single sample. It tests whether the median difference between pairs is zero.

Key Features:

  • Non-parametric alternative to paired t-test

  • Uses ranks of absolute differences

  • Considers both direction and magnitude

  • Assumes symmetric distribution of differences

  • More powerful than Sign Test

We have the observation \((x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\)

Let \(\mu_1\) be the mean of \(X_{s}^{'}\) and \(\mu_2\) be the mean of \(Y_{s}^{'}\). We want to test \(H_0 : \mu_1= \mu_2\) against \(H_1 : \mu_1\pm \mu_2\) or \(\mu_1 > \mu_2\) or \(\mu_1 < \mu_2\).

We use the following table

Alternative Hypothesis Critical/Rejection Region
μ₁ ± μ₂ T ≤ Tα
μ₁ > μ₂ T⁻ ≤ T₂α
μ₁ < μ₂ T⁺ ≤ T₂α

For \(n\geq =15\) it is considered reasonable to assume that the distribution of \(T^+\) is approximately normal. To perform the signed -rank test based on this assumption. we need the following results.

THEOREM

The mean and variance of \(T^+\) are \[ E(T^+)=\frac{n(n+1)}{4}\] \[ Var(T^+)=\frac{n(n+1)(2n+1)}{24}\]

Proof

If the sample is of size n, then the rank to be assigned are 1,2,3,..4. For each rank the probability that it will be assigned to a positive or negative sign are both \(frac{1}{2}\), when \(H_0\) is true.

Let \[Z_i=\begin{Bmatrix} 1 & , If X_i> Y_i \\ 0 & , If X_i< Y_i \end{Bmatrix}\] For \(i=1,2,3,...,n\)

\[T^+=1*Z_1 + 2*Z_2 + ... +nZ_n\] \(Z_1, Z_2, ..., Z_n\) are independent random variables having Bernoulli Distribution with \(P=\frac{1}{2}\)

Since \(E(T^+)=\frac{1}{2}\) and \(Var(T^+)=\frac{1}{4}\)

\[E(T^+)=E(1*Z_1 + 2*Z_2 + ... +n*Z_n)\] \[E(T^+)=1E(Z_1) + 2E(Z_2) + ... +nE(Z_n)\] \[E(T^+)=1*\frac{1}{2}+2*\frac{1}{2}+ ...+ n*\frac{1}{2}\] \[E(T^+)=1*\frac{1+2+...+n}{2}=\frac{n(n+1)}{4}\]

\[Var(T^+)=Var(1*Z_1 + 2*Z_2 + ... +n*Z_n)\] \[Var(T^+)=1^2E(Z_1) + 2^2E(Z_2) + ... +n^2E(Z_n)\] \[Var(T^+)=1*\frac{1}{4}+4*\frac{1}{4}+ ...+ n^2*\frac{1}{4}\] \[Var(T^+)=\frac{n(n+1)(2n+1}{24}\]

4.1 Solved Example

The following are the weighs in pound before and after of 16 persons who stayed on a certain reducing diet for 4 weeks.

Before After
147.0 137.9
183.5 176.2
232.1 219.0
161.6 163.8
197.5 193.5
206.3 201.4
177.0 180.6
215.4 203.2
147.7 149.0
208.1 195.4
166.8 158.3
131.8 134.4
150.3 149.3
197.2 189.1
159.8 159.1
171.7 173.2

Required

Use the signed rank test to test at 0.05 level of significance whether the weight reducing diet is effective.

Solution

We want to test \(H_0 : \mu_1= \mu_2\) against \(H_1 :\mu_1> \mu_2\)
We reject \(H_0\) if \(Z\geq Z_0.05=1.645\) Where test statistics

\[Z=\frac{T^+-E(T^+)}{\sqrt{Var(T^+)}}\]

The difference between the respective pair are

Pair Differences Absolute Differences Rank
1 9.1 9.1 13
2 7.3 7.3 10
3 13.1 13.1 16
4 -2.2 2.2 5
5 4.0 4.0 8
6 4.9 4.9 9
7 -3.6 3.6 7
8 12.2 12.2 14
9 -1.3 1.3 3
10 12.7 12.7 15
11 8.3 8.3 12
12 -2.5 2.5 6
13 1.0 1.0 2
14 8.1 8.1 11
15 0.7 0.7 1
16 -1.5 1.5 4

\(T^+=13+10+16+8+9+14+15+12+2+11+1=111\)

\[E(T^+)=\frac{n(n+1)}{4}=\frac{16*17}{4}=68\]

\[Var(T^+)=\frac{n(n+1)(2n+1}{24}=\frac{16*17*33}{24}=374\]

\[Z=\frac{111-68}{\sqrt{374}}=2.22\]

Conclusion

Since \(2.22> 1.645\), we reject the \(H_0\) at 0.05 Significance level.

4.2 Solved Example: Paired Signed-Rank Test

Test if a diet program reduces weight (α = 0.05)

Data: 8 participants’ weights before and after

Participant Before After Difference Absolute Value Rank Signed Rank
1 185 175 -10 10 7 -7
2 192 190 -2 2 1.5 -1.5
3 210 195 -15 15 8 -8
4 178 180 +2 2 1.5 +1.5
5 165 160 -5 5 3.5 -3.5
6 195 185 -10 10 7 -7
7 200 195 -5 5 3.5 -3.5
8 188 182 -6 6 5 -5

Solution

Step 1: State Hypotheses

H₀: Median difference = 0 (no weight change)

H₁: Median difference < 0 (weight loss)

Step 2: Calculate Differences and Ranks (Shown in table above)

Step 3: Handle Ties

  • |2| appears twice → average ranks (1+2)/2 = 1.5

  • |5| appears twice → average ranks (3+4)/2 = 3.5

  • |10| appears twice → average ranks (6+7)/2 = 6.5

Step 4: Calculate Rank Sums

W⁺ (positive ranks) = 1.5

W⁻ (negative ranks) = 7 + 1.5 + 8 + 3.5 + 7 + 3.5 + 5 = 36.5

Step 5: Test Statistic

For one-tailed test: W = W⁺ = 1.5 (since we’re testing for decrease)

Step 6: Critical Value (n = 8, α = 0.05, one-tailed)

From Wilcoxon table: Critical value = 3

Step 7: Decision Rule

Reject H₀ if W ≤ critical value

since 1.5 ≤ 3 ⇒ Reject H₀

Conclusion: Significant evidence that the diet program reduces weight.

4.3 Solved Example:

Test if a new teaching method improves mathematics scores (α = 0.05)

Data: 8 students’ scores before and after training

Student Before After Difference AV Rank Signed Rank
1 65 72 +7 7 6 +6
2 58 65 +7 7 6 +6
3 72 68 -4 4 4 -4
4 80 85 +5 5 5 +5
5 68 75 +7 7 6 +6
6 75 80 +5 5 5 +5
7 62 70 +8 8 8 +8
8 70 74 +4 4 4 +4

Solution

Step 1: State Hypotheses

H₀: Median difference = 0 (no improvement)

H₁: Median difference > 0 (improvement) - one-tailed

Step 2: Calculate Differences and Absolute Values

(Shown in table above)

Step 3: Rank Absolute Differences

Handle ties by averaging ranks:

  • |4| appears twice: ranks (3+4)/2 = 3.5 → use 4 (no tie adjustment needed in this case)

  • |5| appears twice: ranks (5+6)/2 = 5.5 → use 5

  • |7| appears three times: ranks (7+8+9)/3 = 8 → use 6, 6, 6

Corrected Ranking:

AV Rank
4 3.5
4 3.5
5 5.5
5 5.5
7 8
7 8
7 8
8 7

Step 4: Assign Signed Ranks

Student Signed Rank

Participant Difference
1 +8
2 +8
3 -3.5
4 +5.5
5 +8
6 +5.5
7 +7
8 +3.5

Step 5: Calculate Rank Sums

W⁺ = 8 + 8 + 5.5 + 8 + 5.5 + 7 + 3.5 = 45.5

W⁻ = 3.5

W = min(W⁺, W⁻) = 3.5

Step 6: Critical Value (n=8, α=0.05, one-tailed)

From Wilcoxon table: Critical value = 3

Step 7: Decision

W = 3.5 > 3 ⇒ Fail to reject H₀

Conclusion: No significant evidence of improvement at 5% level.

4.4 Exercise & Assignment

  1. A fitness trainer wants to test if a new workout program increases flexibility. She measures the maximum stretch (in cm) of 8 participants before and after 4 weeks of training:
Participant Before After
1 45 48
2 52 55
3 38 42
4 60 58
5 47 52
6 55 60
7 42 45
8 51 54

Required

  • Perform the paired Signed-Rank Test manually and draw conclusion

  • Verify both tests using R

  1. A clinical trial tested a new drug’s effect on pain reduction. Patients rated their pain level (1-10 scale) before and after treatment:

Data:

Patient Before After
1 8 5
2 7 6
3 9 4
4 6 5
5 8 3
6 7 4
7 9 6
8 5 4
9 8 5
10 7 3
11 6 2
12 8 4
13 7 5
14 9 7
15 8 6

Required:

  • Perform manual paired Signed-Rank Test with normal approximation

  • Construct 95% confidence interval for median difference

  • Write comprehensive statistical report

  1. The accompanying table gives the scores of a group of 15 students in Mathematics and Science
Student Mathematics Science
1 22 33
2 37 68
3 36 42
4 38 49
5 42 51
6 58 65
7 58 65
8 60 71
9 62 55
10 65 74
11 66 68
12 56 64
13 66 67
14 67 73
15 62 65

Required

Use Wilcoxon’s signed-rank test to determine if the location of the distribution of scores differ significantly for the two subject at \(\alpha = 0.05\)

5 The Rank sum Test (The U test/Mann-whitney test)

The \(U\)- test is used for two sample. Other names for the \(U\)- test are; - Wilcoxin test

  • Mann-Whitney test

We are going to test the hypothesis that we are sampling identical continuous population against the alternative that the two population have unequal means. As originally proposed by Wilcoxon, the test is thus based on the values of;

\(W_1\) - the sum of ranks of the first sample \(W_2\) - the sum of ranks of the second sample

It does not matter whether we use \(W_1\) or \(W_2\) for it. There are \(n_1\) values in the first sample and \(n_2\) values in the second sample

\(W_1+W_2\) - the sum of the first \(n_1+n_2\) integers, that is

\[W_1+W_2=\frac{(n_1+n_2)(n_1+n_2 +1)}{2}\]

for any pair of values of \(W_1\) and \(W_2\).

Thus the test based on \(W_1\) and \(W21\) are equivalent.

In actual practice, we seldom use test based on the statistics \(W_1\) and \(W_2\) instead. we use the related statistics

\[U_1=W_1 -\frac{(n_1)(n_1 +1)}{2}\] \[U_2=W_2 -\frac{(n_2)(n_2 +1)}{2}\]

or the Test statistics

\(U=min(U_1, U_2)\)

The resulting tests are all equivalent to the one based on \(W_1\) and \(W_2\) , but they have the advantage that they lead themselves more readily to the construction of tables of critical values.

\[U_1 + U_2 =W_1 -\frac{(n_1)(n_1 +1)}{2} +W_2 -\frac{(n_2)(n_2 +1)}{2}\] \[U_1 + U_2 =\frac{(n_1 +n_2)(n_1 +n_2 + 1)}{2} -\frac{(n_1)(n_1 +1)}{2} -\frac{(n_2)(n_2 +1)}{2}\] \[U_1 + U_2 =n_1n_2\]

that is, The sum of \(U_1 + U_2\) is always \(n_1n_2\) and both of these random variables take on the same range of values from \(0\) to \(n_1n_2\).

We use the following table for testing $ H_0 : _1= _2 $ against $ H_1 :_1_2 $ or \(\mu_1> \mu_2\) or \(\mu_1< \mu_2\)

Alternative Hypothesis Critical / Rejection Region
μ₁ ± μ₂ U ≤ Uₐ
μ₁ > μ₂ U₂⁻ ≤ U₂ₐ
μ₁ < μ₂ U₁⁺ ≤ U₂ₐ

where the level of significance is \(\alpha\) for each test. The critical values of \(U\) are such that \(U_\alpha\) is the larger value for which \(P(U\leq U_\alpha)\) does not exceed \(\alpha\)

The Mann-Whitney U test is a nonparametric test of the null hypothesis that two independent samples come from the same population against an alternative hypothesis, typically that two populations have different medians.

The Mann–Whitney U test (also called the Wilcoxon rank-sum test) is a non-parametric test used to compare whether two independent samples come from the same population (i.e., whether their distributions are identical).

It is often used as an alternative to the independent samples t-test when the assumptions of normality or equal variances are not satisfied

Key Points / Notes

  • Non-parametric alternative to independent t-test

  • Compares two independent groups

  • Uses ranks rather than raw data values

  • Tests stochastic superiority (whether one group tends to have higher values)

  • No normality assumption required

Hypotheses:

H₀: All groups come from identical populations (medians are equal)

H₁: At least one group comes from a different population

Assumptions:

  • Independent random samples

  • Ordinal or continuous data

  • Shape of distributions should be similar

Procedure:

  1. Combine the two groups into one dataset.

  2. Assign ranks to all observations (average ranks in case of ties).

  3. Compute the sum of ranks for each group.

  4. Use the rank sums to calculate the U statistic:

\[U₁ = n₁n₂ + n₁(n₁+1)/2 - R₁\]

\[U₂ = n₁n₂ + n₂(n₂+1)/2 - R₂\]

\[U = min(U₁, U₂)\] where: - \(n_1\) = size of group 1
- \(R_1\) = sum of ranks in group 1
- Similarly, compute \(U\) for group 2
- The smaller of the two \(U\) values is the test statistic

The smaller of the two U values is the test statistic.

Decision Rule

  • For small samples: Compare the observed \(U\) with critical values from Mann–Whitney tables
  • For large samples: Convert to a z-score and use the normal approximation

Interpretation

  • A small U indicates that the distributions differ (one group tends to have systematically higher/lower values than the other)

Advantages:

  • Doesn’t assume normality.

  • Works with ordinal data or continuous data.

  • More robust than the t-test under non-normal distributions.

5.1 Solved Example

Two samples of the frames had the following burning times.

Brand A Brand B
14.9 15.2
11.3 19.8
13.2 14.7
16.6 18.3
17.0 16.2
14.1 21.2
15.4 18.9
13.0 12.2
16.9 15.3
19.4

Test whether the two sample come from identical continuous population or whether the mean burning time of A is less the that of brand B flames at \(\alpha=0.05\)

Solution

We wish to test \(H_0 : \mu_1= \mu_2\) against \(H_1 :\mu_1< \mu_2\) Arrange the values in ascending order the rank the values, we have;

\(W_1\) - the sum of ranks of the first sample

\(W_2\) - the sum of ranks of the second sample

\[W_1=1+3+4+5+7+10+12+13+14=69\]

\[U_1=W_1 -\frac{(n_1)(n_1 +1)}{2} =69 -\frac{(9)(10)}{2} =24\]

For \(n_1 =9\) and \(n_2=10\) then \(U_2\alpha =U_2(0.05) =24\)

Conclusion

Since \(24\leq 24\), we reject the \(H_0\) and conclude that on the average Brand A flames has mean burning time that is less than that for Brand B.

When both \(n_1\) and \(n_2\) are greater than 8 it is considered reasonable to assume that the distribution of \(U_1\) and \(U_2\) can be approximated closely by normal distribution. To perform the given rank sum test on the basis of this assumption, we need the following results.

THEOREM

Under the null hypothesis, the means and variance of \(U_1\) and \(U_2\) are \[E(U_1)=E(U_1)=\frac{n_1 n_2}{2}\] \[Var(U_1)=Var(U_1)=\frac{n_1 n_2(n_1 + n_2 +1)}{12}\]

5.2 Solved Example

The following are the numbers of minutes it took random sample of 15 men and 12 women to complete a written test give for the renewal of their driving licences.

Men Women
9.9 8.6
7.4 10.9
8.9 9.8
9.1 10.7
7.7 9.4
9.7 10.3
11.8 7.3
9.2 11.5
10.0 7.6
10.2 9.3
9.5 8.8
10.8 9.6
8.0
11.0
7.5

Required

Use the U test based on the normal approximation to test.

\(H_0 : \mu_1= \mu_2\) against \(H_1 :\mu_1 \leq \mu_2\)

where \(\mu_1\) and \(\mu_2\) are the average amount of times it takes men and women to complete the test at \(\alpha=0.01\)

Arrange the values in ascending order and the rank the values, we have;

\(W_1\) - the sum of ranks of the first sample

\(W_2\) - the sum of ranks of the second sample

\[W_1=2+3+5+6+9+10+11+14+16+18+20+23+25+27=208\] \[W_2=1+4+7+8+12+13+15+17+21+22+24+26=170\]

\[U_1=W_1 -\frac{(n_1)(n_1 +1)}{2} =208-(15)(8)=88\] \[U_2=W_2 -\frac{(n_2)(n_2 +1)}{2} =170-(6)(13)=92\]

The test statistics,

\(U= min (U_1, U_2) = min(88,92) = 88\)

\[E(U_1)=E(U_1)=\frac{n_1 n_2}{2}=\frac{15*12}{2}=90\] \[Var(U_1)=Var(U_1)=\frac{n_1 n_2(n_1 + n_2 +1)}{12}=\frac{15*12(28)}{12}=420\]

\[Z=\frac{U-E(U)}{\sqrt{Var(U)}}=\frac{88-90}{\sqrt{420}}=-0.098 \sim -0.1\]

Conclusion

Since -1.645<-0.1, we fail to reject \(H_0\) and conclude that the average amount of time it takes men and women to complete the test is the same at \(\alpha=0.01\)

5.3 Solved Example

Compare exam scores between two teaching methods (α = 0.05)

Method A: 78, 85, 92, 65, 70, 88

Method B: 72, 68, 80, 75, 82, 79, 74

Solution

Step 1: State Hypotheses

H₀: The two populations are identical

H₁: Method A tends to produce higher scores than Method B (one-tailed)

Step 2: Combine and Rank All Scores

Score Group Rank
65 A 1
68 B 2
70 A 3
72 B 4
74 B 5
75 B 6
78 A 7
79 B 8
80 B 9
82 B 10
85 A 11
88 A 12
92 A 13

Step 3: Calculate Rank Sums

R₁ (Method A) = 1 + 3 + 7 + 11 + 12 + 13 = 47

R₂ (Method B) = 2 + 4 + 5 + 6 + 8 + 9 + 10 = 44

Step 4: Calculate U Statistics

n₁ = 6, n₂ = 7

U₁ = n₁n₂ + n₁(n₁+1)/2 - R₁ = 6×7 + 6×7/2 - 47 = 42 + 21 - 47 = 16

U₂ = n₁n₂ + n₂(n₂+1)/2 - R₂ = 6×7 + 7×8/2 - 44 = 42 + 28 - 44 = 26

U = min(U₁, U₂) = 16

Step 5: Critical Value (n₁=6, n₂=7, α=0.05, one-tailed)

From Mann-Whitney table: Critical value = 8

Step 6: Decision

U = 16 > 8 ⇒ Fail to reject H₀

Conclusion: No evidence that Method A produces higher scores.

R Code Verification

# Example 1: Teaching Methods
method_A <- c(78, 85, 92, 65, 70, 88)
method_B <- c(72, 68, 80, 75, 82, 79, 74)

wilcox.test(method_A, method_B, alternative = "greater", exact = TRUE)
## 
##  Wilcoxon rank sum exact test
## 
## data:  method_A and method_B
## W = 26, p-value = 0.2669
## alternative hypothesis: true location shift is greater than 0
# Manual calculation function
manual_mann_whitney <- function(group1, group2) {
    # Combine and rank data
    combined <- c(group1, group2)
    ranks <- rank(combined)
    
    n1 <- length(group1)
    n2 <- length(group2)
    
    R1 <- sum(ranks[1:n1])
    R2 <- sum(ranks[(n1+1):(n1+n2)])
    
    U1 <- n1*n2 + n1*(n1+1)/2 - R1
    U2 <- n1*n2 + n2*(n2+1)/2 - R2
    
    cat("Group 1 ranks sum (R1):", R1, "\n")
    cat("Group 2 ranks sum (R2):", R2, "\n")
    cat("U1 =", U1, "U2 =", U2, "\n")
    cat("Test statistic U =", min(U1, U2), "\n")
    
    return(list(U1 = U1, U2 = U2, U = min(U1, U2), R1 = R1, R2 = R2))
}

manual_mann_whitney(method_A, method_B)
## Group 1 ranks sum (R1): 47 
## Group 2 ranks sum (R2): 44 
## U1 = 16 U2 = 26 
## Test statistic U = 16
## $U1
## [1] 16
## 
## $U2
## [1] 26
## 
## $U
## [1] 16
## 
## $R1
## [1] 47
## 
## $R2
## [1] 44
# Example 2: Reaction Times
young <- c(280, 295, 310, 290, 285, 300)
elderly <- c(320, 335, 310, 325, 330, 315, 340)

wilcox.test(young, elderly, alternative = "less", exact = TRUE)
## Warning in wilcox.test.default(young, elderly, alternative = "less", exact =
## TRUE): cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  young and elderly
## W = 0.5, p-value = 0.002111
## alternative hypothesis: true location shift is less than 0
manual_mann_whitney(young, elderly)
## Group 1 ranks sum (R1): 21.5 
## Group 2 ranks sum (R2): 69.5 
## U1 = 41.5 U2 = 0.5 
## Test statistic U = 0.5
## $U1
## [1] 41.5
## 
## $U2
## [1] 0.5
## 
## $U
## [1] 0.5
## 
## $R1
## [1] 21.5
## 
## $R2
## [1] 69.5

5.4 Solved Example

Compare reaction times between two age groups (α = 0.05)

Young: 280, 295, 310, 290, 285, 300

Elderly: 320, 335, 310, 325, 330, 315, 340

Solution

Step 1: State Hypotheses

H₀: The two populations are identical

H₁: Young group has faster reaction times (one-tailed)

Step 2: Combine and Rank All Scores (Handle Ties)

Time Group Rank
280 Young 1
285 Young 2
290 Young 3
295 Young 4
300 Young 5
310 Young 6.5
310 Elderly 6.5
315 Elderly 8
320 Elderly 9
325 Elderly 10
330 Elderly 11
335 Elderly 12
340 Elderly 13

Step 3: Calculate Rank Sums

R₁ (Young) = 1 + 2 + 3 + 4 + 5 + 6.5 = 21.5

R₂ (Elderly) = 6.5 + 8 + 9 + 10 + 11 + 12 + 13 = 69.5

Step 4: Calculate U Statistics

n₁ = 6, n₂ = 7

U₁ = n₁n₂ + n₁(n₁+1)/2 - R₁ = 6×7 + 6×7/2 - 21.5 = 42 + 21 - 21.5 = 41.5

U₂ = n₁n₂ + n₂(n₂+1)/2 - R₂ = 6×7 + 7×8/2 - 69.5 = 42 + 28 - 69.5 = 0.5

U = min(U₁, U₂) = 0.5

Step 5: Critical Value (n₁=6, n₂=7, α=0.05, one-tailed)

From Mann-Whitney table: Critical value = 8

Step 6: Decision

U = 0.5 ≤ 8 ⇒ Reject H₀

Conclusion: Strong evidence that young group has faster reaction times

5.5 Exercise & Assignment

  1. A pharmaceutical company tested a new drug against a placebo. Patients rated their symptom improvement (0-100 scale):

Drug Group: 75, 82, 68, 90, 78, 85, 72, 88

Placebo Group: 62, 58, 71, 65, 70, 63, 67, 60, 64

Required:

  • Perform manual Mann-Whitney U test (show all steps)

  • Test at α = 0.01 significance level

  • Calculate exact p-value using normal approximation

  • Verify with R code

  • Interpret results for clinical context

  1. Compare Employee productivity between two departments:

Department A: 45, 52, 38, 60, 47, 55, 42, 51, 49, 53

Department B: 58, 62, 55, 65, 60, 57, 63, 59, 61, 58, 56, 64

Required:

  • Perform manual Mann-Whitney U test with normal approximation

  • Calculate effect size (r = Z/√N)

  • Construct 95% confidence interval for median difference

  • Compare with independent t-test results

  • Write comprehensive statistical report

R Code Template for Assignments

# Assignment Question 1 Template
drug <- c(75, 82, 68, 90, 78, 85, 72, 88)
placebo <- c(62, 58, 71, 65, 70, 63, 67, 60, 64)

# Mann-Whitney U Test
wilcox.test(drug, placebo, alternative = "greater", exact = TRUE)
## 
##  Wilcoxon rank sum exact test
## 
## data:  drug and placebo
## W = 70, p-value = 0.0001645
## alternative hypothesis: true location shift is greater than 0
# Manual calculation
manual_mann_whitney(drug, placebo)
## Group 1 ranks sum (R1): 106 
## Group 2 ranks sum (R2): 47 
## U1 = 2 U2 = 70 
## Test statistic U = 2
## $U1
## [1] 2
## 
## $U2
## [1] 70
## 
## $U
## [1] 2
## 
## $R1
## [1] 106
## 
## $R2
## [1] 47
# Assignment Question 2 Template
dept_A <- c(45, 52, 38, 60, 47, 55, 42, 51, 49, 53)
dept_B <- c(58, 62, 55, 65, 60, 57, 63, 59, 61, 58, 56, 64)

wilcox.test(dept_A, dept_B, alternative = "two.sided", exact = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  dept_A and dept_B
## W = 7, p-value = 0.0005309
## alternative hypothesis: true location shift is not equal to 0
# Normal approximation details
mw_details <- function(group1, group2) {
    result <- manual_mann_whitney(group1, group2)
    n1 <- length(group1)
    n2 <- length(group2)
    
    # Normal approximation
    mu_U <- n1*n2/2
    sigma_U <- sqrt(n1*n2*(n1+n2+1)/12)
    z <- (result$U - mu_U)/sigma_U
    
    cat("Mean U:", mu_U, "\n")
    cat("SD U:", sigma_U, "\n")
    cat("Z-score:", z, "\n")
    
    # Effect size
    effect_r <- abs(z)/sqrt(n1+n2)
    cat("Effect size r:", effect_r, "\n")
    
    return(list(z = z, effect_r = effect_r))
}

mw_details(dept_A, dept_B)
## Group 1 ranks sum (R1): 62 
## Group 2 ranks sum (R2): 191 
## U1 = 113 U2 = 7 
## Test statistic U = 7 
## Mean U: 60 
## SD U: 15.16575 
## Z-score: -3.494717 
## Effect size r: 0.7450761
## $z
## [1] -3.494717
## 
## $effect_r
## [1] 0.7450761

5.6 Procedure

  1. Combine the two groups into one dataset

  2. Assign ranks to all observations (average ranks in case of ties)

  3. Compute the sum of ranks for each group

  4. Calculate the U statistic:

    \[ U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1 \]

    where:

    • \(n_1\) = size of group 1
    • \(R_1\) = sum of ranks in group 1
    • Similarly, compute \(U\) for group 2
    • The smaller of the two \(U\) values is the test statistic

5.7 Decision Rule

  • For small samples: Compare the observed \(U\) with critical values from Mann–Whitney tables
  • For large samples: Convert to a z-score and use the normal approximation

5.8 Interpretation

  • A small U indicates that the distributions differ (one group tends to have systematically higher/lower values than the other)

5.9 Advantages

  • Doesn’t assume normality
  • Works with ordinal or continuous data
  • More robust than the t-test under non-normal distributions

5.10 Limitations

  • Less powerful than the t-test when data are truly normal
  • Tests difference in distributions, not strictly medians (though often interpreted as median comparison)

6 Kruskal-Wallis H test

The H-test is a generalization of the rank sum test to the case where we test null hypothesis that k samples come from identical continuous populations. As in the U-test, the data are ranked jointly from low to high as though they constitute one sample. Then, let \(R_i\) be the sum of the ranks of the values of the \(i^th\) sample, we base the test on the statistics.

\[H=\frac{12}{n(n+1)} \sum_{i=1}^{k} n_i\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}\] Therefore, the \(H\) statistics is proportional to a weighted mean of the square difference. \[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}\]

where \(\frac{R_i}{n_i}\) is the mean for sample \(i\) and \(\frac{n+1}{2}\) is the mean rank for all the data, it follows that the null hypothesis must be rejected for large value of \(H\).

Simplifying the expression, we have

\[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}=\frac{R_{i}^{2}}{n_{i}^{2}}-\frac{2R_i}{n_i}\left(\frac{n+1}{2}\right)+\left[\frac{(n+1)}{2}\right]^2\]

\[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}=\frac{R_{i}^{2}}{n_{i}^{2}}-\frac{R_i}{n_i }(n+1)+\frac{(n+1)^2}{4}\]

\[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}=n_i\left[\frac{R_{i}}{n_{i}^{2}}-\frac{n+1}{2}\right]^2\]

\[\left[(\frac{R_i}{n_i})-(\frac{n+1}{2})\right]^{2}=\frac{R_{i}^{2}}{n_{i}}-R_i(n+1)+n_i\frac{(n+1)^2}{4}\]

\[\sum_{i=1}^{k}n_i\left[\frac{R_{i}}{n_{i}}-\frac{n+1}{2}\right]^2=\sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}-(n+1)\sum_{i=1}^{k}R_i +\frac{(n+1)^2}{4}\sum_{i=1}^{k}n_i\]

\[\sum_{i=1}^{k}n_i\left[\frac{R_{i}}{n_{i}}-\frac{n+1}{2}\right]^2=\sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}-(n+1)\frac{n(n+1)}{2}+\frac{n(n+1)^2}{4}\]

\[\sum_{i=1}^{k}n_i\left[\frac{R_{i}}{n_{i}}-\frac{n+1}{2}\right]^2=\sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}-\frac{n(n+1)^2}{4}\]

Substituting in H Statistic, we have

\[H=\frac{12}{n(n+1)}\left[\sum_{i=1}^{k}\frac{R_{i}^{2}}{n_{i}}-\frac{n(n+1)^2}{4}\right] \]

\[H=\frac{12}{n(n+1)}\sum_{i=1}^{k}\left[\frac{R_{i}^{2}}{n_{i}}\right]-3(n+1)\]

For very small values of \(k\) and \(n_i\), the test of the null hypothesis may be based on special tables but since the sample distribution of \(H\) depends on the values of \(n_i\). It is impossible to tabulate it in a compact form. Hence, the test usually based on the large sample theory that the sampling distribution of \(H\) can be approximated closely with a chi-square distribution with \(k-1\) degrees of freedom.

The Kruskal-Wallis H test is a non-parametric statistical test used to determine if there are statistically significant differences between three or more independent groups. It is the non-parametric alternative to the one-way ANOVA.

The Kruskal–Wallis test is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing more than two independent groups.

Key Characteristics:

  • Non-parametric alternative to one-way ANOVA

  • Compares three or more independent groups

  • Uses ranks of the data rather than the raw data

  • Tests the null hypothesis that the medians of the groups are equal

Assumptions:

  • The data are independent and randomly sampled.

  • The dependent variable should be measured at least at the ordinal level.

  • The distributions of the groups should have the same shape.

Hypotheses:

H₀: The medians of all groups are equal.

H₁: At least one group has a median different from the others.

The test statistic \(H\) is calculated as:

\[ H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1) \]

Where:

  • \(N\) = total number of observations across all groups
  • \(k\) = number of groups
  • \(R_i\) = sum of ranks for group \(i\)
  • \(n_i\) = number of observations in group \(i\)

Adjustment for Ties

If there are tied ranks, the test statistic is adjusted by dividing \(H\) by:

\[ 1 - \frac{\sum (t^3 - t)}{N^3 - N} \]

Where:

  • \(t\) = number of tied observations in each group of ties
  • The summation is over all groups of tied ranks

Decision Rule:

Reject H₀ if H ≥ critical value from chi-square distribution with (k-1) degrees of freedom, or if p-value ≤ α.

Interpretation

  • A large H suggests that at least one group differs significantly from the others.
  • The test statistic follows a chi-squared distribution with \(k - 1\) degrees of freedom.

Notes

  • The Kruskal–Wallis test is an extension of the Mann–Whitney U test for more than two groups.
  • It does not assume normality and is suitable for ordinal or non-normally distributed interval data.

6.1 Solved Example

The following are the inches per gallon which a test driver got for ten tank of each of the three kinds of gasoline.

Gasoline A Gasoline B Gasoline C
20 29 19
31 18 31
24 29 16
33 19 26
23 20 31
24 21 33
28 34 28
16 33 28
19 30 25
26 23 30

Required

Use the Kruskal Wallis test to test whether or not there is a difference in the actual average mileage yield of the three kinds of gasoline at \(\alpha=0.05\)

Solution

Arrange the values in ascending order and rank the values jointly

Values Ranking Ranking
16 1 1.5
16 2 1.5
18 3 3
33 29 28
34 30 30

\(R_{1}=136.0\) is the sum of ranks for sample \(A\)

\(R_{2}=156.5\) is the sum of ranks for sample \(B\)

\(R_{3}=172.5\) is the sum of ranks for sample \(C\)

\[H=\frac{12}{n(n+1)}\sum_{i=1}^{k}\left[\frac{R_{i}^{2}}{n_{i}}\right]-3(n+1)=\frac{12}{30(31)}\left[\frac{136^{2}}{10}+\frac{156.5^{2}}{10}+\frac{172.5^{2}}{10}\right]-3(31)=0.86\]

The critical chi-square value \(\chi_{2, 0.05}^2=5.991\)

Conclusion

Since \(0.86<5.991\), we fail to reject the Ho and conclude that there is no difference in the actual average mileage yield of the three king of gasoline at \(\alpha=0.05\)

6.2 Solved Exmaple

Compare test scores across three teaching methods (α = 0.05)

Method A: 78, 85, 92, 65, 70

Method B: 72, 68, 80, 75, 82

Method C: 90, 88, 95, 85, 80

Solution

Step 1: State Hypotheses

H₀: The medians of the three groups are equal

H₁: At least one group has a different median

Step 2: Combine and Rank All Scores

Score Group Rank
65 A 1
68 B 2
70 A 3
72 B 4
75 B 5
78 A 6
80 B 7.5
80 C 7.5
82 B 9
85 A 10.5
85 C 10.5
88 C 12
90 C 13
92 A 14
95 C 15

Step 3: Calculate Rank Sums for Each Group

\[R₁ (Method A) = 1 + 3 + 6 + 10.5 + 14 = 34.5\]

\[R₂ (Method B) = 2 + 4 + 5 + 7.5 + 9 = 27.5\]

\[R₃ (Method C) = 7.5 + 10.5 + 12 + 13 + 15 = 58\]

Step 4: Calculate H Statistic

  • Total observations: \(N = 15\)
  • Number of groups: \(k = 3\)
  • Group sizes: \(n_1 = n_2 = n_3 = 5\)
  • Sum of ranks:
  • \(R_1 = 34.5\)
  • \(R_2 = 27.5\)
  • \(R_3 = 58\)

\[ H = \frac{12}{N(N+1)} \left( \frac{R_1^2}{n_1} + \frac{R_2^2}{n_2} + \frac{R_3^2}{n_3} \right) - 3(N+1) \]

\[ H = \frac{12}{240} (238.05 + 151.25 + 672.8) - 48 = 5.105 \]

Step 5: Adjust for Ties

We have two ties:

80 appears twice (ranks 7 and 8 → average 7.5)

85 appears twice (ranks 10 and 11 → average 10.5)

  • For value 80: \(t = 2\)
  • For value 85: \(t = 2\)

\[ \sum (t^3 - t) = (2^3 - 2) + (2^3 - 2) = (8 - 2) + (8 - 2) = 6 + 6 = 12 \]

Total observations: \(N = 15\)

\[ \text{Correction Factor} = 1 - \frac{12}{N^3 - N} \sum (t^3 - t) \]

\[ = 1 - \frac{12 \times 12}{3375 - 15} = 1 - \frac{144}{3360} = 1 - 0.04286 = 0.95714 \]

Note: Your earlier calculation used \(\frac{12}{3360} = 0.00357\), which seems to have missed multiplying by the tie sum (12). The correct adjustment uses \(\frac{144}{3360}\).

Adjusted H Statistic

Original \(H = 5.105\)

\[ H_{\text{adjusted}} = \frac{5.105}{0.95714} \approx 5.334 \]

Step 6: Critical Value and Decision

Degrees of freedom = k-1 = 2

Chi-square critical value at α=0.05 is 5.991

Since 5.122 < 5.991, we fail to reject H₀

Conclusion: No significant difference in test scores among the three teaching methods.

6.3 Solved Exmaple

Compare reaction times across four age groups (α = 0.05)

Group 1 (20-30): 280, 295, 310, 290, 285

Group 2 (31-40): 320, 335, 310, 325, 330, 315

Group 3 (41-50): 340, 355, 350, 345, 360

Group 4 (51-60): 370, 385, 380, 375, 390, 395, 400

Solution

Step 1: State Hypotheses

H₀: The medians of the four groups are equal

H₁: At least one group has a different median

Step 2: Combine and Rank All Reaction Times

We have 5+6+5+7 = 23 observations.

Due to the large dataset, we’ll summarize the ranks:

Group 1: 280 (1), 285 (2), 290 (3), 295 (4), 310 (5.5) [tie with Group 2’s 310]

Group 2: 310 (5.5), 315 (7), 320 (8), 325 (9), 330 (10), 335 (11)

Group 3: 340 (12), 345 (13), 350 (14), 355 (15), 360 (16)

Group 4: 370 (17), 375 (18), 380 (19), 385 (20), 390 (21), 395 (22), 400 (23)

Step 3: Calculate Rank Sums for Each Group

\[R₁ = 1+2+3+4+5.5 = 15.5\]

\[R₂ = 5.5+7+8+9+10+11 = 50.5\]

\[R₃ = 12+13+14+15+16 = 70\]

\[R₄ = 17+18+19+20+21+22+23 = 140\]

Step 4: Calculate H Statistic

  • Total observations: \(N = 23\)
  • Group sizes:
    • \(n_1 = 5\)
    • \(n_2 = 6\)
    • \(n_3 = 5\)
    • \(n_4 = 7\)
  • Sum of ranks:
    • \(R_1 = 15.5\)
    • \(R_2 = 50.5\)
    • \(R_3 = 70\)
    • \(R_4 = 140\)

\[ H = \frac{12}{N(N+1)} \left( \frac{R_1^2}{n_1} + \frac{R_2^2}{n_2} + \frac{R_3^2}{n_3} + \frac{R_4^2}{n_4} \right) - 3(N+1) \]

\[ H = \frac{12}{552} (48.05 + 425.0417 + 980 + 2800) - 72=20.5 \]

One tie: value 310 appears twice.

\[ \sum (t^3 - t) = 2^3 - 2 = 8 - 2 = 6 \]

Correction factor:

\[ 1 - \frac{6}{N^3 - N} = 1 - \frac{6}{12167 - 23} = 1 - \frac{6}{12144} = 1 - 0.000494 = 0.999506 \]

Adjusted H Statistic

\[ H_{\text{adjusted}} = \frac{20.5}{0.999506} \approx 20.51 \]

Step 6: Critical Value and Decision

Degrees of freedom = 4-1 = 3

Chi-square critical value at α=0.05 is 7.815

Since 20.51 > 7.815, we reject H₀

Conclusion: There is a significant difference in reaction times among the age groups.

R Code Verification

# Example 1: Teaching Methods
method_A <- c(78, 85, 92, 65, 70)
method_B <- c(72, 68, 80, 75, 82)
method_C <- c(90, 88, 95, 85, 80)

# Combine into a list
scores <- list(method_A, method_B, method_C)

# Kruskal-Wallis Test
kruskal.test(scores)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  scores
## Kruskal-Wallis chi-squared = 5.1233, df = 2, p-value = 0.07718
# Example 2: Reaction Times
group1 <- c(280, 295, 310, 290, 285)
group2 <- c(320, 335, 310, 325, 330, 315)
group3 <- c(340, 355, 350, 345, 360)
group4 <- c(370, 385, 380, 375, 390, 395, 400)

times <- list(group1, group2, group3, group4)
kruskal.test(times)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  times
## Kruskal-Wallis chi-squared = 20.469, df = 3, p-value = 0.0001357

6.4 Exercise & Assignments

  1. A company wants to compare employee satisfaction across four departments. Satisfaction scores (0-100) are collected:

HR: 75, 82, 68, 90, 78, 85

IT: 72, 68, 80, 75, 82, 79, 74

Finance: 90, 88, 95, 85, 80, 92

Marketing: 65, 70, 72, 68, 75, 80, 78

Required

  • Perform manual Kruskal-Wallis H test (show all steps)

  • Test at α = 0.05 significance level

  • Adjust for ties if necessary

  • Verify with R code

  • Interpret results for management

  1. Compare plant growth (in cm) under four different fertilizers:

Fertilizer A: 15, 18, 22, 17, 20, 25

Fertilizer B: 28, 25, 30, 27, 32, 29, 31

Fertilizer C: 20, 23, 19, 21, 24, 22, 26, 18

Fertilizer D: 35, 40, 38, 42, 36, 39, 41

Required

  • Perform manual Kruskal-Wallis H test with normal approximation

  • Calculate effect size (epsilon-squared)

  • Conduct post-hoc pairwise comparisons using Mann-Whitney U tests with Bonferroni correction

  • Write comprehensive statistical report

# Assignment Question 1 Template
HR <- c(75, 82, 68, 90, 78, 85)
IT <- c(72, 68, 80, 75, 82, 79, 74)
Finance <- c(90, 88, 95, 85, 80, 92)
Marketing <- c(65, 70, 72, 68, 75, 80, 78)

satisfaction <- list(HR, IT, Finance, Marketing)
kruskal.test(satisfaction)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  satisfaction
## Kruskal-Wallis chi-squared = 12.759, df = 3, p-value = 0.005189
# Assignment Question 2 Template
fert_A <- c(15, 18, 22, 17, 20, 25)
fert_B <- c(28, 25, 30, 27, 32, 29, 31)
fert_C <- c(20, 23, 19, 21, 24, 22, 26, 18)
fert_D <- c(35, 40, 38, 42, 36, 39, 41)

growth <- list(fert_A, fert_B, fert_C, fert_D)
kruskal.test(growth)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  growth
## Kruskal-Wallis chi-squared = 22.778, df = 3, p-value = 4.493e-05
# Post-hoc tests with Bonferroni correction
pairwise.wilcox.test(unlist(growth), rep(c("A","B","C","D"), times=sapply(growth, length)), 
                     p.adjust.method = "bonferroni")
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  unlist(growth) and rep(c("A", "B", "C", "D"), times = sapply(growth, length)) 
## 
##   A      B      C     
## B 0.0253 -      -     
## C 1.0000 0.0037 -     
## D 0.0070 0.0035 0.0019
## 
## P value adjustment method: bonferroni
# Effect size
kruskal_result <- kruskal.test(growth)
N <- sum(sapply(growth, length))
H <- kruskal_result$statistic
epsilon_squared <- (H - (length(growth)-1)) / (N - 1)
epsilon_squared
## Kruskal-Wallis chi-squared 
##                  0.7325068

7 The Run test

A run is a succession of identical letters( of other kind of symbols) which is proceeded and followed by different letters of no letter at all. Consider the following arrangement of defective d, and non defective n pieces produced in the given order by a certain machine.

nnnnn dddd nnnnnnnnnn dd nn dddd n dd nn

In this case there is a first run of \(5 n_{s}^{'}\) , then a run of \(4 d_{s}^{'}\), then a run of \(10n_{s}^{'}\) … and finally a run of \(2 n_{s}^{'}\). In all there are 9 runs of varying length. The total number of runs appearing in an arrangement of this kind is often a good indication of a possible lack of randomness. If there are too few runs, we might suspect a definite grouping or clustering or perhaps a trend, if there are two many runs we might suspect some sort of repeated alternations pattern. In our illustration, there seems to be some clustering. If we have \(n_1\) letters of one kind and \(n_2\) letters of another kind. There are;

\[ \binom{n_1 +n_2}{n_1} \]

possible arrangement of these letter. All arrangement are regarding to be equally likely.

The Run Test is a non-parametric statistical test used to determine if a sequence of data values occurs in a random order. It analyzes the pattern of “runs” - sequences of consecutive observations that are either above or below a specified value (usually the median).

Key Concepts:

  • Run: A sequence of consecutive observations all above or all below the cutoff value

  • Purpose: Tests the randomness of data sequence

  • Non-parametric: No assumptions about distribution shape

  • Applications: Quality control, time series analysis, random number generation testing

  • Types of Run Tests:

  • Above/Below Median Test: Uses median as cutoff

  • Above/Below Mean Test: Uses mean as cutoff

  • Up/Down Run Test: Tests sequences of increases/decreases

Hypotheses:

H₀: The sequence is random

H₁: The sequence is not random

Test Statistic:

  • R = number of runs

  • For large samples: Normal approximation with:

\[Mean: μ_R =\frac{(2n₁n₂)}{(n₁+n₂)} + 1\]

\[Variance: σ²_R = \frac{[2n₁n₂(2n₁n₂ - n₁ - n₂)}{[(n₁+n₂)²(n₁+n₂-1)}]\] ## Solved Example

If we have \(2 n_{s}^{'}\) and \(2 n_{s}^{'}\), then the number of possible arrangements of these letters is

The combination equation is given by:

\[ \binom{2 + 2}{2} = \binom{4}{2} = 6 \]

These arrangements are

aabb abab abba baba bbaa baab

Has 2R 4R 3R 4R 2R 3R

Let there be \(n_1\) letters of one kind and \(n_2\) letters of anorher kind.

Let \(U\) be the number of runs formed by these \(n_1 + n_2\) letters.

If \(U=2k\) , where k is a positive integer, then we have \(k\) runs of the \(n_1\) letters and \(k\) runs of \(n_2\) letters.

The number of ways in which \(n_1\) letters can form \(k\) runs is

\[ \binom{n_1 -1}{k-1} \]

The number of ways in which \(n_2\) letters can form \(k\) runs is

\[ \binom{n_2 -1}{k-1} \]

The probability mass function for the number of runs \(U = 2k\) is given by:

\[ P(U = 2k) = f(u) = \frac{ 2 \binom{n_1 - 1}{k - 1} \binom{n_2 - 1}{k - 1} }{ \binom{n_1 + n_2}{n_1} } \]

Probabilities of \(U\) run’s

If \(U = 2k + 1\), then

\[ f(u) = P[U_1 = k + 1, U_2 = k] + P[U_1 = k, U_2 = k + 1] \]

Hence,

\[ f(u) = \frac{ \binom{n_1 - 1}{k} \binom{n_2 - 1}{k - 1} }{ \binom{n_1 + n_2}{n_1} } + \frac{ \binom{n_1 - 1}{k - 1} \binom{n_2 - 1}{k} }{ \binom{n_1 + n_2}{n_1} } \]

\[ f(u) = \frac{ \binom{n_1 - 1}{k} \binom{n_2 - 1}{k - 1} + \binom{n_1 - 1}{k - 1} \binom{n_2 - 1}{k} }{ \binom{n_1 + n_2}{n_1} } \]

When \(n_1\) and \(n_2\) are small the test of randomness based on \(U\) (the number of runs) are usually performed with the use of special tables such that Table XI. We reject the null hypothesis of randomness at the level of significance \(alpha\) if

\[ U \leq U_{\alpha / 2}' \quad \text{or} \quad U \geq U_{1 - \alpha / 2}' \]

where \(U{\alpha/{2}}\) is the largest value for which \(P(U\leq U{\alpha/{2}}{'}\) does not exceed \(\alpha/{2}\) and \(U{\alpha/{2}}\) is the smallest value for which \(P(U\geq U{\alpha/{2}}{'}\) does not exceed \(\alpha/{2}\).

7.1 Solved Example

Checking on LCM trees that were planted many years age a long a country road, a country official obtained the following arrangement of healthy, \(H\) and diseased \(D\) trees

HHHHDDDHHHHHHHHDDHDDDD

Test whether this arrangement may be regarded as random at \(\alpha=0.05\)

Solution

\(H_0: Arrangement is random\) vs \(H_1: Arrangement is not random\)

From \(H_0\) if \(U\leq 6\) or \(U\geq 17\) From the table, we have \(U\)= Run=6.

Conclusion We reject the \(H_0\) and conclude that the arrangement of healthy and diseased trees is not random.

7.2 Solved Example

To test whether a radio signal contains a message or it constitute random noise, an interval of time is subdivided into a number of vary short interval and for each of there it is determined whether the signal strength exceed, E or does not exceed, N a certain level of background noise. Using the normal approximation test whether the following arrangement thus, obtained may be regarded as random and hence that the signal does not contain a message.

NNNENENENEEENEEENEENENEENEENNENEEENENNNENNENNNNE

Solution

\(H_0\): Arrangement is random vs \(H_1\): Arrangement is not random

if \(U\) is the number of runs , then

The expected value and variance of \(U\) are given by:

\[ E(U) = \frac{2n_1 n_2}{n_1 + n_2} + 1 \]

Substituting \(n_1 = n_2 = 24\):

\[ E(U) = \frac{2 \times 24 \times 24}{24 + 24} + 1 = 25 \]

The variance of \(U\) is:

\[ Var(U) = \frac{2n_1 n_2 (2n_1 n_2 - n_1 - n_2)}{(n_1 + n_2)^2 (n_1 + n_2 - 1)} \]

Substituting the values:

\[ Var(U) = \frac{2 \times 24 \times 24 (2 \times 24 \times 24 - 24 - 24)}{(24 + 24)^2 (24 + 24 - 1)} = 11.74 \]

Hence, the standardized test statistic is:

\[ Z = \frac{U \pm 0.5 - E(U)}{\sqrt{Var(U)}} \]

Substituting \(U = 30\):

\[ Z = \frac{30 - 0.5 - 25}{\sqrt{11.74}} = 1.312 \]

Note

If \(U<E(U)\) add 0.5 and \(U>E(U)\) subtract 0.5

The critical value are \(\pm 2.575\)

Conclusion

Since \(-2.575<1.312<2.575\), we fail to reject \(H_0\) and the signal does not contain any message at \(\alpha=0.01\)

7.3 Solved Example

Test if the following sequence of 20 measurements is random (α = 0.05):

Data: 45, 52, 38, 60, 47, 55, 42, 51, 49, 53, 58, 62, 55, 65, 60, 57, 63, 59, 61, 58

Solution

Step 1: Calculate Median

Sorted data: 38, 42, 45, 47, 49, 51, 52, 53, 55, 55, 57, 58, 58, 59, 60, 60, 61, 62, 63, 65

Median = (55 + 57)/2 = 56

Step 2: Convert to A/B Sequence

A = Above median, B = Below median

45(B), 52(B), 38(B), 60(A), 47(B), 55(A), 42(B), 51(B), 49(B), 53(B), 58(A), 62(A), 55(A), 65(A), 60(A), 57(A), 63(A), 59(A), 61(A), 58(A)

Step 3: Identify Runs

Sequence: B, B, B, A, B, A, B, B, B, B, A, A, A, A, A, A, A, A, A, A

Runs: BBB | A | B | A | BBBB | AAAAAAAAAA

Number of runs (R) = 6

Step 4: Count n₁ and n₂

n₁ (A’s) = 11

n₂ (B’s) = 9

Total n = 20

Step 5: Calculate Mean and Variance

\[μ_R = \frac{(2n₁n₂)}{(n₁+n₂)} + 1 = \frac{(2×11×9)}{(11+9)} + 1 = 10.9\]

\[σ²_R =\frac{[2n₁n₂(2n₁n₂ - n₁ - n₂)}{[(n₁+n₂)²(n₁+n₂-1)}]=4.637\]

\[σ_R = √4.637 = 2.153\]

Step 6: Calculate Test Statistic

\[Z = (R - μ_R)/σ_R = (6 - 10.9)/2.153 = -4.9/2.153 = -2.276\]

Step 7: Critical Value and Decision

Two-tailed test, α = 0.05

Critical Z-value = ±1.96

-2.276 < -1.96 ⇒ Reject H₀

Conclusion: Sequence is not random.

7.4 Solved Example

Test if the following stock price sequence shows random fluctuations (α = 0.05):

Prices: 100, 102, 98, 105, 103, 107, 110, 108, 112, 115, 113, 118, 120, 119, 122

Solution

Step 1: Calculate Differences

Price Difference Sign
100
102 +2 +
98 -4
105 +7 +
103 -2
107 +4 +
110 +3 +
108 -2
112 +4 +
115 +3 +
113 -2
118 +5 +
120 +2 +
119 -1
122 +3 +

Step 2: Identify Runs in Sign Sequence

Sequence: +, -, +, -, +, +, -, +, +, -, +, +, -, +

Runs: + | - | + | - | ++ | - | ++ | - | ++ | - | +

Number of runs (R) = 11

Step 3: Calculate Parameters

For up/down run test with n = 14 observations:

\[μ_R = (2n - 1)/3 = (2×14 - 1)/3 = (28 - 1)/3 = 27/3 = 9\]

\[σ²_R = (16n - 29)/90 = (16×14 - 29)/90 = (224 - 29)/90 = 195/90 = 2.167\] \[σ_R = √2.167 = 1.472\]

Step 4: Calculate Test Statistic

\[Z = (R - μ_R)/σ_R = (11 - 9)/1.472 = 2/1.472 = 1.359\]

Step 5: Critical Value and Decision

Two-tailed test, α = 0.05

Critical Z-value = ±1.96

1.359 < 1.96 ⇒ Fail to reject H₀

Conclusion: Stock price fluctuations appear random.

R Code Verification

# Example 1: Above/Below Median Test
data1 <- c(45, 52, 38, 60, 47, 55, 42, 51, 49, 53, 58, 62, 55, 65, 60, 57, 63, 59, 61, 58)

# Using randtests package
if(require(randtests)) {
    runs.test(data1, threshold = median(data1))
}
## Loading required package: randtests
## 
##  Runs Test
## 
## data:  data1
## statistic = -2.2973, runs = 6, n1 = 10, n2 = 10, n = 20, p-value =
## 0.0216
## alternative hypothesis: nonrandomness
# Manual calculation function
manual_runs_test <- function(data, cutoff = median(data)) {
    # Create sequence of A/B
    sequence <- ifelse(data > cutoff, "A", "B")
    cat("Sequence:", sequence, "\n")
    
    # Count runs
    runs <- 1
    for(i in 2:length(sequence)) {
        if(sequence[i] != sequence[i-1]) {
            runs <- runs + 1
        }
    }
    cat("Number of runs (R):", runs, "\n")
    
    # Count n1 and n2
    n1 <- sum(sequence == "A")
    n2 <- sum(sequence == "B")
    cat("n1 (A's):", n1, "n2 (B's):", n2, "\n")
    
    # Calculate mean and variance
    mu_R <- (2*n1*n2)/(n1+n2) + 1
    var_R <- (2*n1*n2*(2*n1*n2 - n1 - n2))/((n1+n2)^2*(n1+n2-1))
    sigma_R <- sqrt(var_R)
    
    cat("Mean (μ_R):", mu_R, "\n")
    cat("Variance (σ²_R):", var_R, "\n")
    cat("SD (σ_R):", sigma_R, "\n")
    
    # Calculate Z-statistic
    Z <- (runs - mu_R)/sigma_R
    cat("Z-statistic:", Z, "\n")
    
    return(list(runs = runs, Z = Z, p_value = 2*pnorm(-abs(Z))))
}

manual_runs_test(data1)
## Sequence: B B B A B B B B B B A A B A A A A A A A 
## Number of runs (R): 6 
## n1 (A's): 10 n2 (B's): 10 
## Mean (μ_R): 11 
## Variance (σ²_R): 4.736842 
## SD (σ_R): 2.176429 
## Z-statistic: -2.297341
## $runs
## [1] 6
## 
## $Z
## [1] -2.297341
## 
## $p_value
## [1] 0.0215993
# Example 2: Up/Down Run Test
prices <- c(100, 102, 98, 105, 103, 107, 110, 108, 112, 115, 113, 118, 120, 119, 122)

# Using randtests package
if(require(randtests)) {
    runs.test(prices, alternative = "two.sided", plot = TRUE)
}

## 
##  Runs Test
## 
## data:  prices
## statistic = -3.3381, runs = 2, n1 = 7, n2 = 7, n = 14, p-value =
## 0.0008436
## alternative hypothesis: nonrandomness
# Manual up/down test
manual_up_down_test <- function(data) {
    # Calculate differences and signs
    differences <- diff(data)
    signs <- sign(differences)
    cat("Differences:", differences, "\n")
    cat("Signs:", signs, "\n")
    
    # Count runs
    runs <- 1
    for(i in 2:length(signs)) {
        if(signs[i] != signs[i-1]) {
            runs <- runs + 1
        }
    }
    cat("Number of runs (R):", runs, "\n")
    
    n <- length(signs)
    mu_R <- (2*n - 1)/3
    var_R <- (16*n - 29)/90
    sigma_R <- sqrt(var_R)
    
    cat("n:", n, "\n")
    cat("Mean (μ_R):", mu_R, "\n")
    cat("Variance (σ²_R):", var_R, "\n")
    cat("SD (σ_R):", sigma_R, "\n")
    
    Z <- (runs - mu_R)/sigma_R
    cat("Z-statistic:", Z, "\n")
    
    return(list(runs = runs, Z = Z, p_value = 2*pnorm(-abs(Z))))
}

manual_up_down_test(prices)
## Differences: 2 -4 7 -2 4 3 -2 4 3 -2 5 2 -1 3 
## Signs: 1 -1 1 -1 1 1 -1 1 1 -1 1 1 -1 1 
## Number of runs (R): 11 
## n: 14 
## Mean (μ_R): 9 
## Variance (σ²_R): 2.166667 
## SD (σ_R): 1.47196 
## Z-statistic: 1.358732
## $runs
## [1] 11
## 
## $Z
## [1] 1.358732
## 
## $p_value
## [1] 0.1742314

7.5 Exercise & Assignment

  1. A sample of 36 tools produced by a machine shows the following sequence of good (G) and bad (B) tools:

GGGBGGGGGBBBGGGGGGBBGGGGGGGGGBBGGGGG

Test the machine produced good (G) and bad (B) tools in random order at 5% level of significance

  1. The following are the number of student absent from school on so consecutive school days

49 46 57 37 45 57 50 65 34 50 58 46 47 42 53 67 40 52 49 66 53 48 68 40 53 49 61 54 48 38

Test for randomness at 5% significance level

  1. A driver buys gasoline either at BP station (B) or at total station (T) and the following arrangement shows the order of the stations from which she bought gasoline over a certain period of time.

BBTBBBBTTBBBBTBTTBBBBTBTTBBBBBTBBBTBTBBBTTBTTTTBBBBBTTBBBTTT

Test for randomness at 5% significance level

  1. A manufacturing process produces items with the following diameters (mm):

Data: 25.1, 25.3, 24.8, 25.5, 24.9, 25.2, 24.7, 25.4, 25.0, 25.6, 24.6, 25.7, 24.5, 25.8, 24.4, 25.9, 24.3, 26.0, 24.2, 26.1

Required:

  • Perform runs test above/below median (manual + R)

  • Test at α = 0.05 significance level

  • Calculate exact p-value

  • Interpret results for quality control

  • Suggest possible causes if non-random

  1. Daily returns (%) for a stock over 25 days (Financial Data Analysis):

Returns: 1.2, -0.8, 2.1, -1.5, 0.9, -2.3, 1.8, -0.7, 2.5, -1.9, 0.6, -2.7, 1.4, -0.5, 2.8, -2.1, 0.8, -3.2, 1.7, -0.9, 2.3, -1.8, 0.7, -2.9, 1.5

Required:

  • Perform up/down runs test (manual + R)

  • Test at α = 0.01 significance level

  • Calculate confidence interval for number of runs

  • Compare with above/below mean test

  • Write comprehensive analysis report

R Code Template for Assignments

# Assignment Question 1 Template
diameters <- c(25.1, 25.3, 24.8, 25.5, 24.9, 25.2, 24.7, 25.4, 25.0, 25.6,
               24.6, 25.7, 24.5, 25.8, 24.4, 25.9, 24.3, 26.0, 24.2, 26.1)

# Using randtests package
if(require(randtests)) {
    result1 <- runs.test(diameters, threshold = median(diameters))
    print(result1)
}
## 
##  Runs Test
## 
## data:  diameters
## statistic = 4.1352, runs = 20, n1 = 10, n2 = 10, n = 20, p-value =
## 3.546e-05
## alternative hypothesis: nonrandomness
# Manual calculation
manual_runs_test(diameters)
## Sequence: B A B A B A B A B A B A B A B A B A B A 
## Number of runs (R): 20 
## n1 (A's): 10 n2 (B's): 10 
## Mean (μ_R): 11 
## Variance (σ²_R): 4.736842 
## SD (σ_R): 2.176429 
## Z-statistic: 4.135215
## $runs
## [1] 20
## 
## $Z
## [1] 4.135215
## 
## $p_value
## [1] 3.54623e-05
# Assignment Question 2 Template
returns <- c(1.2, -0.8, 2.1, -1.5, 0.9, -2.3, 1.8, -0.7, 2.5, -1.9,
             0.6, -2.7, 1.4, -0.5, 2.8, -2.1, 0.8, -3.2, 1.7, -0.9,
             2.3, -1.8, 0.7, -2.9, 1.5)

# Up/down runs test
if(require(randtests)) {
    result2 <- runs.test(returns)
    print(result2)
}
## 
##  Runs Test
## 
## data:  returns
## statistic = 4.1742, runs = 23, n1 = 12, n2 = 12, n = 24, p-value =
## 2.99e-05
## alternative hypothesis: nonrandomness
# Manual up/down test
manual_up_down_test(returns)
## Differences: -2 2.9 -3.6 2.4 -3.2 4.1 -2.5 3.2 -4.4 2.5 -3.3 4.1 -1.9 3.3 -4.9 2.9 -4 4.9 -2.6 3.2 -4.1 2.5 -3.6 4.4 
## Signs: -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 
## Number of runs (R): 24 
## n: 24 
## Mean (μ_R): 15.66667 
## Variance (σ²_R): 3.944444 
## SD (σ_R): 1.986063 
## Z-statistic: 4.195907
## $runs
## [1] 24
## 
## $Z
## [1] 4.195907
## 
## $p_value
## [1] 2.71782e-05
# Above/below mean test for comparison
if(require(randtests)) {
    result3 <- runs.test(returns, threshold = mean(returns))
    print(result3)
}
## 
##  Runs Test
## 
## data:  returns
## statistic = 4.715, runs = 25, n1 = 13, n2 = 12, n = 25, p-value =
## 2.417e-06
## alternative hypothesis: nonrandomness
# Comprehensive analysis function
comprehensive_runs_analysis <- function(data, alpha = 0.05) {
    cat("=== COMPREHENSIVE RUNS ANALYSIS ===\n")
    cat("Sample size:", length(data), "\n")
    cat("Mean:", mean(data), "\n")
    cat("Median:", median(data), "\n\n")
    
    # Above/below median test
    cat("1. ABOVE/BELOW MEDIAN TEST:\n")
    med_test <- manual_runs_test(data, median(data))
    cat("P-value:", med_test$p_value, "\n")
    cat("Significant at", alpha, "level:", med_test$p_value < alpha, "\n\n")
    
    # Up/down test
    cat("2. UP/DOWN RUNS TEST:\n")
    up_down_test <- manual_up_down_test(data)
    cat("P-value:", up_down_test$p_value, "\n")
    cat("Significant at", alpha, "level:", up_down_test$p_value < alpha, "\n\n")
    
    # Confidence interval for runs
    n1 <- sum(data > median(data))
    n2 <- sum(data < median(data))
    mu_R <- (2*n1*n2)/(n1+n2) + 1
    var_R <- (2*n1*n2*(2*n1*n2 - n1 - n2))/((n1+n2)^2*(n1+n2-1))
    sigma_R <- sqrt(var_R)
    
    z_critical <- qnorm(1 - alpha/2)
    CI_lower <- mu_R - z_critical * sigma_R
    CI_upper <- mu_R + z_critical * sigma_R
    
    cat("3. CONFIDENCE INTERVAL FOR RUNS:\n")
    cat(100*(1-alpha), "% CI: [", CI_lower, ", ", CI_upper, "]\n")
    cat("Observed runs:", med_test$runs, "\n")
    cat("Within CI:", (med_test$runs >= CI_lower & med_test$runs <= CI_upper), "\n")
}

comprehensive_runs_analysis(returns)
## === COMPREHENSIVE RUNS ANALYSIS ===
## Sample size: 25 
## Mean: -0.04 
## Median: 0.6 
## 
## 1. ABOVE/BELOW MEDIAN TEST:
## Sequence: A B A B A B A B A B B B A B A B A B A B A B A B A 
## Number of runs (R): 23 
## n1 (A's): 12 n2 (B's): 13 
## Mean (μ_R): 13.48 
## Variance (σ²_R): 5.9696 
## SD (σ_R): 2.443276 
## Z-statistic: 3.896407 
## P-value: 9.763021e-05 
## Significant at 0.05 level: TRUE 
## 
## 2. UP/DOWN RUNS TEST:
## Differences: -2 2.9 -3.6 2.4 -3.2 4.1 -2.5 3.2 -4.4 2.5 -3.3 4.1 -1.9 3.3 -4.9 2.9 -4 4.9 -2.6 3.2 -4.1 2.5 -3.6 4.4 
## Signs: -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 
## Number of runs (R): 24 
## n: 24 
## Mean (μ_R): 15.66667 
## Variance (σ²_R): 3.944444 
## SD (σ_R): 1.986063 
## Z-statistic: 4.195907 
## P-value: 2.71782e-05 
## Significant at 0.05 level: TRUE 
## 
## 3. CONFIDENCE INTERVAL FOR RUNS:
## 95 % CI: [ 8.304616 ,  17.69538 ]
## Observed runs: 23 
## Within CI: FALSE

8 Rank Correlation Coefficient}

Since the assumption underlying the significance test for the correlation coefficient are rather stringent. It is sometimes preferable to use a non parametric alternative. Most popular among the non-parametric measure of association is the rank correlation coefficient also called Spearman’s rank correlation

For a given set of data \((x_1,y_1), (x_2,y_2), ... , (x_n,y_n)\). The rank correlation coefficient is obtained by ranking the \(x_{s}^{'}\) among themselves and also the \(y_{s}^{'}\) both from low to high or from high to low, the rank correlation coefficient is given by

\[r_s=1-\frac{6\sum d_{i}^{2}}{n(n^2 -1)}\]

Where \(d_i\) is the difference between the ranks assigned to \(x_i\) and \(y_i\)

Where there are ties in rank we assign the tied observations the mean of the ranks which they jointly occupy.

8.1 Solved Example

The following are the number of hours which ten students studied for an examination and the marks which they obtained.

No. of Hours Marks Scored
8 56
5 44
11 79
13 72
10 70
5 54
18 94
15 85
2 33
8 65

Required

Calculate rank correlation coefficient

Solution

No. of Hours Marks Scored Rank (H) Rank (M) (d^2)
8 56 6.5 7 0.25
5 44 8.5 9 0.25
11 79 4 3 1.00
13 72 3 4 1.00
10 70 5 5 0.00
5 54 8.5 8 0.25
18 94 1 1 0.00
15 85 2 2 0.00
2 33 10 10 0.00
8 65 6.5 6 0.25

\[r_s=1-\frac{6\sum d_{i}^{2}}{n(n^2 -1)}=1-\frac{6*3}{10(99)}=0.98\]

For small value of \(n\leq 10\) the test of the null hypothesis of no correlation may be based on special tables determined from the exact sample distribution of \(r_s\). Most of times, though we use the fact that the distribution \(r_s\) can be approximated closely with a normal distribution.

Theorem

Under the null hypothesis of no correlation the mean and variance of \(r_s\) are;

\[E(r_s)=0\] \[Var(r_s)=\frac{1}{n-1}\]

NOTE

Strictly speaking, the theorem applying when there are no ties, but the results can be used as an approximation unless the number of ties is large.

8.2 Solved Example

From the data of Example 12 above test whether the rank correlation coefficient is significant at \(\alpha=0.01\)

Solution

\(H_0\): There is no correlation vs \(H_1\): There is correlation is correlation

Test statistic

\[ Z=\frac{r_s -E(r_s)}{\sqrt{Var(r_s)}}=r_s \sqrt{n-1}=0.998*3=2.94\]

We reject \(H_0\) if \(Z\leq -2.575\) or \(Z\geq 2.575\)

Conclusion Since 2.94> 2.575, we reject \(H_0\) and conclude that there is strong positive relationship between study time and grades.

NOTE

When there are no ties in rank, actually equals the correlation coefficient \(r\). Calculate for the ranks. Let \(R_i\) and \(S_i\) be the ranks of \(x_i\) and \(y_i\). We find that

\[ \sum_{i=1}^{n} r_i = \sum_{i=1}^{n} s_i = \frac{n(n+1)}{2} \]

\[ \sum_{i=1}^{n} r_i^2 = \sum_{i=1}^{n} s_i^2 = \frac{n(n+1)(2n+1)}{6} \]

Let
\[ d_i = r_i - s_i \]

Then,
\[ d_i^2 = (r_i - s_i)^2 = r_i^2 - 2r_i s_i + s_i^2 \]

Summing over all \(i\),
\[ \sum_{i=1}^{n} d_i^2 = \sum_{i=1}^{n} r_i^2 - 2\sum_{i=1}^{n} r_i s_i + \sum_{i=1}^{n} s_i^2 \]

Rearranging,
\[ 2\sum_{i=1}^{n} r_i s_i = \sum_{i=1}^{n} r_i^2 + \sum_{i=1}^{n} s_i^2 - \sum_{i=1}^{n} d_i^2 \]

Hence,
\[ \sum_{i=1}^{n} r_i s_i = \frac{\sum_{i=1}^{n} r_i^2 + \sum_{i=1}^{n} s_i^2 - \sum_{i=1}^{n} d_i^2}{2} \]

Substituting
\(\sum r_i^2 = \sum s_i^2 = \frac{n(n+1)(2n+1)}{6}\),
we get:

\[ \sum_{i=1}^{n} r_i s_i = \frac{n(n+1)(2n+1)}{6} - \frac{\sum_{i=1}^{n} d_i^2}{2} \]

\[ r = \frac{ \sum_{i=1}^{n} r_i s_i - \frac{ \left(\sum_{i=1}^{n} r_i\right) \left(\sum_{i=1}^{n} s_i\right) }{n} }{ \sqrt{ \left(\sum_{i=1}^{n} r_i^2 - n\overline{r}^2\right) \left(\sum_{i=1}^{n} s_i^2 - n\overline{s}^2\right) } } \]

\[ r = \frac{ \frac{n(n+1)(2n+1)}{6} - \frac{\sum_{i=1}^{n} d_i^2}{2} - \frac{1}{n} \frac{n(n+1)}{2} \frac{n(n+1)}{2} }{ \sqrt{ \left[\frac{n(n+1)(2n+1)}{6} - n\left(\frac{n+1}{2}\right)^2\right] \left[\frac{n(n+1)(2n+1)}{6} - n\left(\frac{n+1}{2}\right)^2\right] } } \]

\[ r = \frac{ \left[\frac{n(n+1)(2n+1)}{6} - \frac{(n\frac{(n+1)}{2})^2}{n}\right] - \frac{\sum_{i=1}^{n} d_i^2}{2} }{ \left[\frac{n(n+1)(2n+1)}{6} - \frac{(n\frac{(n+1)}{2})^2}{n}\right] } \]

\[ r = 1 - \frac{ \frac{\sum_{i=1}^{n} d_i^2}{2} }{ \left[ \frac{n(n+1)(2n+1)}{6} - \frac{(n\frac{(n+1)}{2})^2}{n} \right] } \]

Workings

\[ \left[\frac{n(n+1)(2n+1)}{6}-\frac{\big(n\frac{(n+1)}{2}\big)^{2}}{n}\right]. \]

Factor and simplify:

\[ \begin{aligned} \frac{n(n+1)(2n+1)}{6}-\frac{\big(n\frac{(n+1)}{2}\big)^{2}}{n} &= \frac{n+1}{12}\big[2n(2n+1)-3n(n+1)\big] \\[6pt] &= \frac{n+1}{12}\big[4n^2+2n-3n^2-3n\big] \\[6pt] &= \frac{n+1}{12}\big[n^2-n\big] \\[6pt] &= \frac{n(n+1)(n-1)}{12} \\[6pt] &= \frac{n(n^2-1)}{12}. \end{aligned} \]

Substituting this into the earlier expression for \(r\) gives

\[ r = 1 - \frac{\dfrac{\sum_{i=1}^{n} d_i^2}{2}}{\dfrac{n(n^2-1)}{12}} = 1 - \frac{6\sum_{i=1}^{n} d_i^2}{n(n^2-1)}. \]

This is the standard form of Spearman’s rank correlation coefficient:

\[ r_s \;=\; 1 - \frac{6\sum d_i^2}{n(n^2-1)}. \]

8.3 Exercise & Assessment

  1. The following table shows how a pand of nutrition experts and a pand of house wives ranked 15 breakfast foods on their palatability breakfast.
Nutrition Housewives
3 5
7 4
11 8
9 14
1 2
4 6
10 12
8 7
5 1
13 15
12 9
2 3
15 10
6 11
4 13

Required

Calculate rank correlation coefficient and the test the null hypothesis of no correlation at \(\alpha=0.05\)

  1. The following are the marks which 12 students obtained in the mid-term and final Examination in a course in Statistics
Midterm Final Exam
71 83
49 62
80 76
73 77
93 89
85 74
58 48
52 78
64 76
32 51
87 73
80 89

Required

Calculate rank correlation coefficient and the test the null hypothesis of no correlation at \(\alpha=0.05\)

  1. The Sociology marks (X) and Statistics marks (Y) of 10 students are given below
X Y
78 80
86 74
49 63
94 85
53 55
97 90
74 85
53 71
58 67
62 64

Required: Test whether the rank correlation coefficient is significant at 5% level of significance

    1. An educational researcher wants to compare the effectiveness of two study methods. She randomly selects 12 students and has each student study a particular topic using method A and method B (in random order). The students’ test scores (out of 100) are recorded for both methods. The data is shown below:
Student Score (Method A) Score (Method B)
1 22 53
2 37 68
3 36 42
4 38 49
5 42 51
6 38 65
7 58 51
8 60 71
9 62 55
10 65 74
11 66 68
12 56 64
13 66 67
14 67 73
15 62 65

Required: Test if there is a significant difference between the two study methods at the 5% significance level.

9 The Sample Distribution Function (Empirical Distribution Function)

Suppose that the random variable \(X_1, X_2, ... ,X_n\) form a random sample from some continuous distribution and let \(x_1, x_2, ... ,x_n\) denote the observed values of \(X_1, X_2, ... ,X_n\).

Since the observation come from a continuous distribution there is probability zero that any two of the observed values \(x_1, x_2, ... ,x_n\) will be equal. Therefore, we shall assume for simplicity that all \(n\) values are different. We consider a function \(F_n(x)\) which is constructed from the value \(x_1, x_2, ... ,x_n\) as follows.

For each number\(x(-\infty <x<\infty)\) the value of \(F_n(x)\) is defined as the proportion of observed values in the sample to \(x\). In other words if exactly \(k\) of the observed values in the sample are less than or equal to \(x\), then \(F_n(x)=\frac{k}{n}\). the Function \(F_n(x)\) defined this way is called the sample distribution function (s.d.f)}. Sometimes \(F_n(x)\) is called the Empirical distribution function

If we let \(y_1<y_2< ... <y_n\) denote the values of the order statistics of the sample then \(F_n(x)=0\) for $ x<y_1$. \(F_n(x)\) jumps to the value \(\frac{1}{n}\) at\(x=y_1\) and remains at \(\frac{1}{n}\) for \(y_1\leq x<y_2\), \(F_n(x)\) jumps to \(\frac{2}{n}\) at \(x=y_2\) and remaining at \(\frac{2}{n}\) for \(y_2\leq x<y_3\) and so on.

Now let \(F(x)\) denote the distribution function of the distribution from which the random sample \(x_1, x_2, ... ,x_n\) was drawn. For any given number \(x(-\infty <x<\infty)\) the probability that any particular \(X\) is less than or equal to \(x\) if \(F(x)\), therefore, it follows from the law of large numbers that as \(n\rightarrow \infty\), the proportion of \(F_n(x)\) of the observation is the sample which are less than or equal to \(x\) will converge to \(F(x)\) on probability. In symbols;

\[ P\left[\lim_{n\to\infty} f_n(x)\right]=F(x), For -\infty<x,\infty \]

n=sample size of \(x_1,x_2, ... ,x_n\).

The relation above expresses the fact that each point x, the sample distribution \(F_n(x)\) will converge to the actual \(F(x)\) of the distribution from which the random samples was drawn.

The Sample Distribution Function (also called Empirical Distribution Function - EDF) is a non-parametric estimator of the cumulative distribution function (CDF) of a random variable. It is defined as the proportion of sample observations that are less than or equal to a given value x.

For a sample \(X_1, X_2, \ldots, X_n\), the Empirical Distribution Function (EDF) is defined as:

\[F_n(x) = \frac{1}{n} \sum_{i=1}^{n} I(X_i \leq x)\]

where \(I(\cdot)\) is the indicator function that equals 1 if the condition is true and 0 otherwise.

Key Properties:

  • Non-parametric: Makes no assumptions about underlying distribution

  • Step function: Jumps by \[1/n\] at each data point (or by \[k/n\] for k ties)

  • Consistent estimator:

\[F_n(x)→F(x)\] as n→ ∞ n→∞ (Glivenko-Cantelli Theorem)

  • Unbiased:

\[E[F_n(x)]=F(x)\]

  • Applications:
  1. Goodness-of-fit tests (Kolmogorov-Smirnov, Anderson-Darling)

  2. Non-parametric confidence intervals

  3. Visualizing data distribution

  4. Comparing multiple distributions

9.1 Solved Example

Given the following sample of 10 measurements:

Data: 12.5, 15.2, 18.7, 14.1, 16.8, 13.9, 17.3, 15.6, 19.2, 14.8

Construct the empirical distribution function.

Solution

Step 1: Sort the Data

Sorted values: 12.5, 13.9, 14.1, 14.8, 15.2, 15.6, 16.8, 17.3, 18.7, 19.2

Step 2: Calculate EDF Values

Interval (x) # of observations ≤ (x) (F_n(x))
(x < 12.5) 0 0.0
(12.5 x < 13.9) 1 0.1
(13.9 x < 14.1) 2 0.2
(14.1 x < 14.8) 3 0.3
(14.8 x < 15.2) 4 0.4
(15.2 x < 15.6) 5 0.5
(15.6 x < 16.8) 6 0.6
(16.8 x < 17.3) 7 0.7
(17.3 x < 18.7) 8 0.8
(18.7 x < 19.2) 9 0.9
(x ) 10 1.0

Step 3: EDF Formula \[ F_n(x) = \begin{cases} 0.0 & \text{if } x < 12.5 \\ 0.1 & \text{if } 12.5 \leq x < 13.9 \\ 0.2 & \text{if } 13.9 \leq x < 14.1 \\ 0.3 & \text{if } 14.1 \leq x < 14.8 \\ 0.4 & \text{if } 14.8 \leq x < 15.2 \\ 0.5 & \text{if } 15.2 \leq x < 15.6 \\ 0.6 & \text{if } 15.6 \leq x < 16.8 \\ 0.7 & \text{if } 16.8 \leq x < 17.3 \\ 0.8 & \text{if } 17.3 \leq x < 18.7 \\ 0.9 & \text{if } 18.7 \leq x < 19.2 \\ 1.0 & \text{if } x \geq 19.2 \end{cases} \]

Step 4: Calculate Specific Values

  • Fₙ(14.0) = 0.1 (since 14.0 < 13.9 is false, but 14.0 < 14.1 is true)

  • Fₙ(15.5) = 0.5

  • Fₙ(17.0) = 0.7

  • Fₙ(20.0) = 1.0

9.2 Solved Example : Manual Computation with Ties

Given the following sample with repeated values:

Data: 8, 12, 8, 15, 12, 18, 15, 20, 12, 8

Construct the empirical distribution function.

Solution

Step 1: Sort the Data and Count Frequencies

Sorted values: 8, 8, 8, 12, 12, 12, 15, 15, 18, 20

Value Frequency Cumulative Frequency
8 3 3
12 3 6
15 2 8
18 1 9
20 1 10

Step 2: Calculate EDF Values

Interval (x) # of observations ≤ (x) (F_n(x))
(x < 8) 0 0.0
(8 x < 12) 3 0.3
(12 x < 15) 6 0.6
(15 x < 18) 8 0.8
(18 x < 20) 9 0.9
(x ) 10 1.0

Step 3: EDF Formula

\[ F_n(x) = \begin{cases} 0.0 & \text{if } x < 8 \\ 0.3 & \text{if } 8 \leq x < 12 \\ 0.6 & \text{if } 12 \leq x < 15 \\ 0.8 & \text{if } 15 \leq x < 18 \\ 0.9 & \text{if } 18 \leq x < 20 \\ 1.0 & \text{if } x \geq 20 \end{cases} \]

Step 4: Calculate Specific Values

  • Fₙ(10) = 0.3

  • Fₙ(12) = 0.6

  • Fₙ(16) = 0.8

  • Fₙ(19) = 0.9

R Code Implementation

# Example 1: Continuous Data
data1 <- c(12.5, 15.2, 18.7, 14.1, 16.8, 13.9, 17.3, 15.6, 19.2, 14.8)

# Using ecdf function
ecdf1 <- ecdf(data1)
print(ecdf1)
## Empirical CDF 
## Call: ecdf(data1)
##  x[1:10] =   12.5,   13.9,   14.1,  ...,   18.7,   19.2
# Calculate specific values
cat("F(14.0) =", ecdf1(14.0), "\n")
## F(14.0) = 0.2
cat("F(15.5) =", ecdf1(15.5), "\n")
## F(15.5) = 0.5
cat("F(17.0) =", ecdf1(17.0), "\n")
## F(17.0) = 0.7
cat("F(20.0) =", ecdf1(20.0), "\n")
## F(20.0) = 1
# Plot EDF
plot(ecdf1, main = "Empirical Distribution Function", 
     xlab = "x", ylab = "F(x)", col = "blue")
grid()

# Manual calculation function
manual_edf <- function(data, x_values) {
    sorted_data <- sort(data)
    n <- length(data)
    
    cat("Sorted data:", sorted_data, "\n")
    cat("Sample size n =", n, "\n\n")
    
    for(x in x_values) {
        count <- sum(data <= x)
        F_x <- count/n
        cat(sprintf("F(%.1f) = %d/%d = %.2f\n", x, count, n, F_x))
    }
    
    # Return EDF function
    edf_function <- function(x) {
        sapply(x, function(x_val) sum(data <= x_val)/n)
    }
    
    return(edf_function)
}

# Test manual function
x_test <- c(14.0, 15.5, 17.0, 20.0)
manual_edf(data1, x_test)
## Sorted data: 12.5 13.9 14.1 14.8 15.2 15.6 16.8 17.3 18.7 19.2 
## Sample size n = 10 
## 
## F(14.0) = 2/10 = 0.20
## F(15.5) = 5/10 = 0.50
## F(17.0) = 7/10 = 0.70
## F(20.0) = 10/10 = 1.00
## function (x) 
## {
##     sapply(x, function(x_val) sum(data <= x_val)/n)
## }
## <bytecode: 0x000002445f304740>
## <environment: 0x000002446070ae40>
# Example 2: Data with Ties
data2 <- c(8, 12, 8, 15, 12, 18, 15, 20, 12, 8)

ecdf2 <- ecdf(data2)
print(ecdf2)
## Empirical CDF 
## Call: ecdf(data2)
##  x[1:5] =      8,     12,     15,     18,     20
# Calculate specific values
cat("F(10) =", ecdf2(10), "\n")
## F(10) = 0.3
cat("F(12) =", ecdf2(12), "\n")
## F(12) = 0.6
cat("F(16) =", ecdf2(16), "\n")
## F(16) = 0.8
cat("F(19) =", ecdf2(19), "\n")
## F(19) = 0.9
# Plot both EDFs for comparison
plot(ecdf1, main = "Comparison of EDFs", col = "blue", lwd = 2)
plot(ecdf2, col = "red", lwd = 2, add = TRUE)
legend("bottomright", legend = c("Data1", "Data2"), 
       col = c("blue", "red"), lwd = 2)
grid()

9.3 Exercise & Assignment

  1. A researcher collected reaction times (in milliseconds) from 15 participants:

Data: 245, 289, 267, 312, 278, 295, 301, 256, 324, 288, 273, 299, 315, 282, 307

Required:

  • Construct the empirical distribution function manually

  • Calculate Fₙ(280), Fₙ(300), Fₙ(320)

  • Plot the EDF using R

  • Estimate P(270 ≤ X ≤ 310) using the EDF

  • Find the 80th percentile using the EDF

  1. Two different teaching methods were tested. Exam scores were recorded:

Method A: 78, 85, 92, 65, 70, 88, 75, 82, 95, 80

Method B: 72, 68, 80, 75, 82, 79, 74, 85, 78, 90

Required:

  • Construct EDFs for both groups manually

  • Plot both EDFs on the same graph using R

  • Calculate Fₙ(80) for both groups

  • Estimate the median from each EDF

  • Perform visual comparison and interpret differences

  • Calculate the maximum vertical distance between EDFs (Kolmogorov-Smirnov statistic)

R Code Template for Assignments

# Assignment Question 1 Template
reaction_times <- c(245, 289, 267, 312, 278, 295, 301, 256, 324, 288, 273, 299, 315, 282, 307)

# Manual EDF construction function
construct_edf <- function(data) {
    sorted <- sort(data)
    n <- length(data)
    
    cat("Sorted data:", sorted, "\n")
    cat("Sample size:", n, "\n\n")
    
    # Create EDF table
    unique_vals <- unique(sorted)
    edf_table <- data.frame(
        x_min = c(-Inf, unique_vals),
        x_max = c(unique_vals, Inf),
        F_n = c(0, cumsum(table(sorted))/n)
    )
    
    print(edf_table)
    
    # EDF function
    edf <- function(x) {
        sapply(x, function(x_val) sum(data <= x_val)/n)
    }
    
    return(edf)
}

edf_rt <- construct_edf(reaction_times)
## Sorted data: 245 256 267 273 278 282 288 289 295 299 301 307 312 315 324 
## Sample size: 15 
## 
##     x_min x_max        F_n
##      -Inf   245 0.00000000
## 245   245   256 0.06666667
## 256   256   267 0.13333333
## 267   267   273 0.20000000
## 273   273   278 0.26666667
## 278   278   282 0.33333333
## 282   282   288 0.40000000
## 288   288   289 0.46666667
## 289   289   295 0.53333333
## 295   295   299 0.60000000
## 299   299   301 0.66666667
## 301   301   307 0.73333333
## 307   307   312 0.80000000
## 312   312   315 0.86666667
## 315   315   324 0.93333333
## 324   324   Inf 1.00000000
# Calculate specific values
cat("F(280) =", edf_rt(280), "\n")
## F(280) = 0.3333333
cat("F(300) =", edf_rt(300), "\n")
## F(300) = 0.6666667
cat("F(320) =", edf_rt(320), "\n")
## F(320) = 0.9333333
# Probability estimation
P_270_310 <- edf_rt(310) - edf_rt(270)
cat("P(270 ≤ X ≤ 310) ≈", P_270_310, "\n")
## P(270 ≤ X ≤ 310) ≈ 0.6
# Percentile estimation
find_percentile <- function(edf, data, p) {
    sorted <- sort(data)
    n <- length(data)
    index <- ceiling(p * n)
    return(sorted[index])
}

cat("80th percentile ≈", find_percentile(edf_rt, reaction_times, 0.8), "\n")
## 80th percentile ≈ 307
# Plot
plot(ecdf(reaction_times), main = "EDF of Reaction Times", 
     xlab = "Reaction Time (ms)", ylab = "F(x)", col = "blue", lwd = 2)
grid()

# Assignment Question 2 Template
method_A <- c(78, 85, 92, 65, 70, 88, 75, 82, 95, 80)
method_B <- c(72, 68, 80, 75, 82, 79, 74, 85, 78, 90)

# Construct EDFs
edf_A <- construct_edf(method_A)
## Sorted data: 65 70 75 78 80 82 85 88 92 95 
## Sample size: 10 
## 
##    x_min x_max F_n
##     -Inf    65 0.0
## 65    65    70 0.1
## 70    70    75 0.2
## 75    75    78 0.3
## 78    78    80 0.4
## 80    80    82 0.5
## 82    82    85 0.6
## 85    85    88 0.7
## 88    88    92 0.8
## 92    92    95 0.9
## 95    95   Inf 1.0
edf_B <- construct_edf(method_B)
## Sorted data: 68 72 74 75 78 79 80 82 85 90 
## Sample size: 10 
## 
##    x_min x_max F_n
##     -Inf    68 0.0
## 68    68    72 0.1
## 72    72    74 0.2
## 74    74    75 0.3
## 75    75    78 0.4
## 78    78    79 0.5
## 79    79    80 0.6
## 80    80    82 0.7
## 82    82    85 0.8
## 85    85    90 0.9
## 90    90   Inf 1.0
# Compare specific values
cat("F_A(80) =", edf_A(80), "\n")
## F_A(80) = 0.5
cat("F_B(80) =", edf_B(80), "\n")
## F_B(80) = 0.7
# Medians
median_A <- find_percentile(edf_A, method_A, 0.5)
median_B <- find_percentile(edf_B, method_B, 0.5)
cat("Median A =", median_A, "\n")
## Median A = 80
cat("Median B =", median_B, "\n")
## Median B = 78
# Kolmogorov-Smirnov statistic
ks_statistic <- function(data1, data2) {
    all_vals <- sort(unique(c(data1, data2)))
    edf1 <- ecdf(data1)
    edf2 <- ecdf(data2)
    
    differences <- abs(edf1(all_vals) - edf2(all_vals))
    max_diff <- max(differences)
    
    cat("K-S statistic =", max_diff, "\n")
    return(max_diff)
}

ks_value <- ks_statistic(method_A, method_B)
## K-S statistic = 0.2
# Plot comparison
plot(ecdf(method_A), main = "Comparison of EDFs: Method A vs B", 
     xlab = "Exam Score", ylab = "F(x)", col = "blue", lwd = 2)
plot(ecdf(method_B), col = "red", lwd = 2, add = TRUE)
legend("bottomright", legend = c("Method A", "Method B"), 
       col = c("blue", "red"), lwd = 2)
grid()

10 Kolmogorov-Smirnov Test

10.1 Kolmogorov- Smirnor test

Suppose that we wish to test the simple null hypothesis that the unknown distribution function \(f(x)\) is actually a particular continuous distribution function \(F^x(x)\) against the general alternative that the actual distribution function is not \(F^*(x)\) i.e

\[H_0 : F_n(x)=F^* (x), -\infty < x \infty\] \[H_1 : F_n(x) \neq F^* (x), -\infty < x \infty \]

This problem is a non-parametric problem because the unknown distribution from which the random sample was taken might be any continuous distribution. Let $ F_n(x)$ denote the sample distribution function (s.d.f).

Let \[D_n=\sup_{-\infty<x<\infty} |F_n(x)- F^(x)|\]

In other words \(D_n\) is the maximum difference that S.d.f \(F_n(x)\) and the hypothesis \(F^*(x)\). When the null hypothesis \(F^*(x)\) is true, the probability distribution of \(D_n\) will be a certain distribution which is the same for any possible continuous distribution function \(F^*(x)\) and \(F^*(x)\) depends on the particular distribution function \(F^*(x)\) being studied in a specified problem

Table of this distribution for various values of the sample size, \(n\), have been developed and one tabulated in collection of statistical tables. The table give the value of \(D_n^\alpha\) such that; \[P\left[D_n \leq D_n^ \alpha\right]=1-\alpha\] when \(H_0\) is true.

\[P\left[D_n > D_n^ \alpha\right]=\alpha\]

The Kolmogorov-Smirnov (K-S) test is a non-parametric statistical test used to determine whether a sample comes from a specified distribution (one-sample test) or whether two samples come from the same distribution (two-sample test).

Key Concepts:

  • Non-parametric: No assumptions about distribution parameters

  • Based on EDF: Uses the Empirical Distribution Function

  • Test statistic: Maximum vertical distance between distributions

  • Sensitive to differences: In location, shape, and scale

Types of K-S Tests:

1. One-sample K-S test: Compares sample EDF with theoretical CDF

2. Two-sample K-S test: Compares EDFs of two samples

Hypotheses:

  • H₀: The sample follows the specified distribution (one-sample) or both samples come from the same distribution (two-sample)

  • H₁: The sample does not follow the specified distribution or the samples come from different distributions

Test Statistic

For one-sample KS Test

The test statistic \(D\) is defined as:

\[D = \sup_x \left| F_n(x) - F(x) \right|\]

  • \(F_n(x)\): Empirical distribution function (EDF) of the sample.
  • \(F(x)\): Theoretical cumulative distribution function (CDF).
  • \(\sup_x\): The supremum (maximum) over all values of \(x\).

This measures the largest absolute difference between the empirical and theoretical distributions.

Two-Sample KS Test

The test statistic \(D\) is defined as:

\[D = \sup_x \left| F_{n1}(x) - F_{n2}(x) \right|\]

  • \(F_{n1}(x)\): EDF of the first sample.
  • \(F_{n2}(x)\): EDF of the second sample.

This measures the largest absolute difference between the two empirical distributions.

10.2 Solved Example

Test the hypothesis by the Kolmogorov -Smirnor test, that the following sample come from a standard normal distribution. Take \(\alpha=0.05\)

-1.23 1.64 -0.21 0.70 1.40
0.44 -0.07 1.07 -0.02 -0.15
1.76 1.62 0.40 -2.11 -0.99
-0.42 0.81 1.47 -2.46 0.88
1.39 0.42 0.27 -0.39 -0.10

Solution

\[H_0 : F_n(x)=F^*(x), -2.46 < x 1.76\] \[H_1 : F_n(x) \neq F^*(x), -2.46 < x 1.76 \]

where \(F^*(x)\) is \(\phi(x)\) the cumulative distribution of the standard normal distribution

\[ D_{25} = \sup_{-2.46 < x < 1.76} \, \big| F_{25}(x) - F^{*}(x) \big| = 0.137 \]

\(D_25^0.05=0.27\)

\(D_25 =0.137<0.27\)

Conclusion Since \(D_25 =0.137<0.27\), we accept \(H_0\) and conclude that the sample comes from a standard normal distribution at \(\alpha=0.05\)

Workings

We arrange the sample value in ascending order and for each x, we determine \(F_n(x)\) and \(F^*(x)\).

i xᵢ Fₙ(x) F*(x) Absolute diff
1 -2.46 0.04 0.007 0.033
2 -2.11 0.08 0.017 0.063
3 -1.23 0.12 0.109 0.011
4 -0.99 0.16 0.161 0.001
5 -0.42 0.20 0.337 0.137
6 -0.39 0.24 0.348 0.080
7 -0.21 0.28 0.417 0.137
8 -0.55 0.32 0.440 0.120
9 -0.10 0.36 0.460 0.100
10 -0.07 0.40 0.472 0.072
11 -0.02 0.44 0.492 0.052
12 0.27 0.48 0.606 0.126
13 0.40 0.52 0.655 0.135
14 0.42 0.56 0.663 0.103
15 0.44 0.60 0.670 0.070
16 0.70 0.64 0.758 0.118
17 0.81 0.68 0.791 0.111
18 0.88 0.72 0.811 0.091
19 1.07 0.76 0.858 0.098
20 1.39 0.80 0.918 0.118
21 1.40 0.84 0.919 0.079
22 1.47 0.88 0.927 0.047
23 1.62 0.92 0.947 0.027
24 1.64 0.96 0.950 0.010
25 1.76 1.00 0.961 0.039

10.3 Solved Example : One-Sample K-S Test (Manual)

Test whether the following sample comes from a normal distribution with μ=50, σ=10 (α=0.05)

Data: 45, 52, 58, 48, 60, 55, 53, 50, 47, 56

Solution

Step 1: State Hypotheses

  • H₀: The data follows N(50, 10²)

  • H₁: The data does not follow N(50, 10²)

Step 2: Sort Data and Calculate EDF

Sorted data: 45, 47, 48, 50, 52, 53, 55, 56, 58, 60

n = 10

(x) (i) F_n(x) = i/n
45 1 0.1
47 2 0.2
48 3 0.3
50 4 0.4
52 5 0.5
53 6 0.6
55 7 0.7
56 8 0.8
58 9 0.9
60 10 1.0

Step 3: Calculate Theoretical CDF F(x)

Using standard normal:\[z = (x - 50)/10\]

(x) (z) (F(z))
45 -0.5 0.3085
47 -0.3 0.3821
48 -0.2 0.4207
50 0.0 0.5000
52 0.2 0.5793
53 0.3 0.6179
55 0.5 0.6915
56 0.6 0.7257
58 0.8 0.7881
60 1.0 0.8413

Step 4: Calculate Differences

x Fₙ(x) F(x) \[|Fₙ(x) - F(x)|\]]
45 0.1 0.3085 0.2085
47 0.2 0.3821 0.1821
48 0.3 0.4207 0.1207
50 0.4 0.5000 0.1000
52 0.5 0.5793 0.0793
53 0.6 0.6179 0.0179
55 0.7 0.6915 0.0085
56 0.8 0.7257 0.0743
58 0.9 0.7881 0.1119
60 1.0 0.8413 0.1587

Step 5: Find Maximum Difference

\[D = \sup_x \left| F_n(x) - F(x) \right| = max(0.2085, 0.1821, ..., 0.1587) = 0.2085\]

Step 6: Critical Value

For n=10, α=0.05, the critical value from K-S table is 0.409

Step 7: Decision

\[D = 0.2085 < 0.409\] ⇒ Fail to reject H₀

Conclusion: No evidence that the data doesn’t follow N(50,10²)

10.4 Solved Example : Two-Sample K-S Test (Manual)

Test whether two samples come from the same distribution (α=0.05)

Sample A: 12, 15, 18, 20, 2

Sample B: 10, 14, 16, 19, 24

Solution

Step 1: State Hypotheses

  • H₀: Both samples come from the same distribution

  • H₁: The samples come from different distributions

Step 2: Combine and Sort All Data

Combined sorted: 10(B), 12(A), 14(B), 15(A), 16(B), 18(A), 19(B), 20(A), 22(A), 24(B)

Step 3: Calculate EDFs for Both Samples

n₁ = 5, n₂ = 5

x Source Fₙ₁(x) Fₙ₂(x) Absolute Difference
10 B 0/5 = 0.0 1/5 = 0.2 0.2
12 A 1/5 = 0.2 1/5 = 0.2 0.0
14 B 1/5 = 0.2 2/5 = 0.4 0.2
15 A 2/5 = 0.4 2/5 = 0.4 0.0
16 B 2/5 = 0.4 3/5 = 0.6 0.2
18 A 3/5 = 0.6 3/5 = 0.6 0.0
19 B 3/5 = 0.6 4/5 = 0.8 0.2
20 A 4/5 = 0.8 4/5 = 0.8 0.0
22 A 5/5 = 1.0 4/5 = 0.8 0.2
24 B 5/5 = 1.0 5/5 = 1.0 0.0

Step 4: Calculate Test Statistic

\[D =sup(|Fₙ₁(x) - Fₙ₂(x)|) = 0.2 \]

Step 5: Critical Value

For n₁=n₂=5, α=0.05, the critical value from K-S table is 0.6

Step 6: Decision

D = 0.2 < 0.6 ⇒ Fail to reject H₀

Conclusion: No evidence that the samples come from different distributions

R Code Implementation

# Example 1: One-Sample K-S Test
data1 <- c(45, 52, 58, 48, 60, 55, 53, 50, 47, 56)

# Test against normal distribution with mean=50, sd=10
ks_result1 <- ks.test(data1, "pnorm", mean=50, sd=10)
print(ks_result1)
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  data1
## D = 0.30854, p-value = 0.2421
## alternative hypothesis: two-sided
# Manual calculation verification
manual_ks_one_sample <- function(data, dist_func, ...) {
    n <- length(data)
    sorted_data <- sort(data)
    
    # Empirical CDF
    F_n <- (1:n)/n
    
    # Theoretical CDF
    F_theoretical <- dist_func(sorted_data, ...)
    
    # Calculate differences
    D_plus <- max(F_n - F_theoretical)
    D_minus <- max(F_theoretical - c(0, F_n[-n]))
    D <- max(D_plus, D_minus)
    
    cat("D+ =", D_plus, "\n")
    cat("D- =", D_minus, "\n")
    cat("D =", D, "\n")
    
    return(D)
}

D_manual <- manual_ks_one_sample(data1, pnorm, mean=50, sd=10)
## D+ = 0.1586553 
## D- = 0.3085375 
## D = 0.3085375
# Example 2: Two-Sample K-S Test
sample_A <- c(12, 15, 18, 20, 22)
sample_B <- c(10, 14, 16, 19, 24)

ks_result2 <- ks.test(sample_A, sample_B)
print(ks_result2)
## 
##  Exact two-sample Kolmogorov-Smirnov test
## 
## data:  sample_A and sample_B
## D = 0.2, p-value = 1
## alternative hypothesis: two-sided
# Manual calculation for two-sample
manual_ks_two_sample <- function(sample1, sample2) {
    n1 <- length(sample1)
    n2 <- length(sample2)
    
    # Combine and sort all data
    all_data <- sort(c(sample1, sample2))
    
    # Calculate EDFs
    F1 <- sapply(all_data, function(x) sum(sample1 <= x)/n1)
    F2 <- sapply(all_data, function(x) sum(sample2 <= x)/n2)
    
    # Calculate maximum difference
    D <- max(abs(F1 - F2))
    
    cat("Maximum difference D =", D, "\n")
    
    # Create comparison table
    comparison <- data.frame(
        x = all_data,
        F1 = F1,
        F2 = F2,
        diff = abs(F1 - F2)
    )
    print(comparison)
    
    return(D)
}

D_manual_2 <- manual_ks_two_sample(sample_A, sample_B)
## Maximum difference D = 0.2 
##     x  F1  F2 diff
## 1  10 0.0 0.2  0.2
## 2  12 0.2 0.2  0.0
## 3  14 0.2 0.4  0.2
## 4  15 0.4 0.4  0.0
## 5  16 0.4 0.6  0.2
## 6  18 0.6 0.6  0.0
## 7  19 0.6 0.8  0.2
## 8  20 0.8 0.8  0.0
## 9  22 1.0 0.8  0.2
## 10 24 1.0 1.0  0.0

10.5 Exercise & Assignment

  1. Using the Kolmogorov-Smirnov methods, test the hypothesis that the following values form a random sample from a normal distribution with mean of 2 and variance of 4. Use alpha=5%

2.72 3.84 0.88 5.72 5.48 3.12 0.10 2.48 1.70 0.52 3.64 3.40 1.80 -0.52 -0.12 2.30 3.10 1.04 1.02

  1. Test the hypothesis by the Kolmogorov -Smirnor test that the following sample come form a normal distribution with a mean 0.07 and variance of 1 .(Take \(\alpha=0.05\))

0.36 0.92 -0.56 1.86 1.74 0.56 -0.95 0.24 -0.15 -0.48 -0.74 0.34 0.82 0.70 -0.10 -1.26 -1.06 0.15 0.55 -0.49

  1. A manufacturer claims their product has a lifetime following an exponential distribution with mean 100 hours. Test this claim using the following sample of 12 product lifetimes (hours):

Data: 85, 120, 65, 150, 95, 110, 78, 135, 88, 125, 102, 140

Required:

  • Perform manual one-sample K-S test (show all steps)

  • Test at α=0.05 significance level

  • Calculate exact p-value using approximation

  • Verify with R code

  • Interpret results for quality control

  1. Compare the distribution of test scores between two different teaching methods:

Method A: 78, 85, 92, 65, 70, 88, 75, 82, 95, 80, 72, 68

Method B: 72, 68, 80, 75, 82, 79, 74, 85, 78, 90, 86, 83

Required

  • Perform manual two-sample K-S test (show all steps)

  • Test at α=0.05 significance level

  • Calculate the critical value for the test

  • Verify with R code

  • Write comprehensive analysis report

R Code Template for Assignments

# Assignment Question 1 Template
lifetimes <- c(85, 120, 65, 150, 95, 110, 78, 135, 88, 125, 102, 140)

# One-sample K-S test for exponential distribution with mean=100
ks_exp <- ks.test(lifetimes, "pexp", rate=1/100)
print(ks_exp)
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  lifetimes
## D = 0.47795, p-value = 0.004841
## alternative hypothesis: two-sided
# Manual calculation
manual_ks_exp <- function(data, rate) {
    n <- length(data)
    sorted_data <- sort(data)
    
    # Empirical CDF
    F_n <- (1:n)/n
    
    # Theoretical exponential CDF
    F_exp <- pexp(sorted_data, rate=rate)
    
    # Calculate differences
    D_plus <- max(F_n - F_exp)
    D_minus <- max(F_exp - c(0, F_n[-n]))
    D <- max(D_plus, D_minus)
    
    cat("Sample size:", n, "\n")
    cat("D+ =", D_plus, "\n")
    cat("D- =", D_minus, "\n")
    cat("D =", D, "\n")
    
    # Critical value approximation
    critical_value <- 1.358 / sqrt(n)
    cat("Critical value (α=0.05):", critical_value, "\n")
    cat("Reject H0:", D > critical_value, "\n")
    
    return(list(D = D, critical_value = critical_value))
}

result1 <- manual_ks_exp(lifetimes, 1/100)
## Sample size: 12 
## D+ = 0.2231302 
## D- = 0.4779542 
## D = 0.4779542 
## Critical value (α=0.05): 0.3920208 
## Reject H0: TRUE
# Assignment Question 2 Template
method_A <- c(78, 85, 92, 65, 70, 88, 75, 82, 95, 80, 72, 68)
method_B <- c(72, 68, 80, 75, 82, 79, 74, 85, 78, 90, 86, 83)

# Two-sample K-S test
ks_two <- ks.test(method_A, method_B)
print(ks_two)
## 
##  Exact two-sample Kolmogorov-Smirnov test
## 
## data:  method_A and method_B
## D = 0.16667, p-value = 0.9923
## alternative hypothesis: two-sided
# Manual calculation with detailed output
detailed_ks_two_sample <- function(sample1, sample2, alpha=0.05) {
    n1 <- length(sample1)
    n2 <- length(sample2)
    
    # Combine and sort all data
    all_data <- sort(unique(c(sample1, sample2)))
    
    # Calculate EDFs
    F1 <- sapply(all_data, function(x) sum(sample1 <= x)/n1)
    F2 <- sapply(all_data, function(x) sum(sample2 <= x)/n2)
    
    differences <- abs(F1 - F2)
    D <- max(differences)
    max_index <- which.max(differences)
    
    # Create detailed table
    result_table <- data.frame(
        x = all_data,
        F1 = round(F1, 3),
        F2 = round(F2, 3),
        Difference = round(differences, 3)
    )
    
    cat("Detailed Comparison:\n")
    print(result_table)
    cat("\nMaximum difference at x =", all_data[max_index], "\n")
    cat("Test statistic D =", D, "\n")
    
    # Critical value
    c_alpha <- sqrt(-0.5 * log(alpha/2))
    critical_value <- c_alpha * sqrt((n1 + n2)/(n1 * n2))
    cat("Critical value (α=", alpha, "):", critical_value, "\n")
    cat("Reject H0:", D > critical_value, "\n")
    
    return(list(D = D, critical_value = critical_value, table = result_table))
}

result2 <- detailed_ks_two_sample(method_A, method_B)
## Detailed Comparison:
##     x    F1    F2 Difference
## 1  65 0.083 0.000      0.083
## 2  68 0.167 0.083      0.083
## 3  70 0.250 0.083      0.167
## 4  72 0.333 0.167      0.167
## 5  74 0.333 0.250      0.083
## 6  75 0.417 0.333      0.083
## 7  78 0.500 0.417      0.083
## 8  79 0.500 0.500      0.000
## 9  80 0.583 0.583      0.000
## 10 82 0.667 0.667      0.000
## 11 83 0.667 0.750      0.083
## 12 85 0.750 0.833      0.083
## 13 86 0.750 0.917      0.167
## 14 88 0.833 0.917      0.083
## 15 90 0.833 1.000      0.167
## 16 92 0.917 1.000      0.083
## 17 95 1.000 1.000      0.000
## 
## Maximum difference at x = 70 
## Test statistic D = 0.1666667 
## Critical value (α= 0.05 ): 0.5544426 
## Reject H0: FALSE
# Visualization
plot(ecdf(method_A), col="blue", main="EDF Comparison: Method A vs B",
     xlab="Test Scores", ylab="Cumulative Probability")
plot(ecdf(method_B), col="red", add=TRUE)
legend("bottomright", legend=c("Method A", "Method B"), 
       col=c("blue", "red"), lwd=2)
grid()

Advanced K-S Test Functions in R

# Comprehensive K-S analysis function
comprehensive_ks_analysis <- function(data1, data2 = NULL, dist = NULL, ...) {
    cat("=== KOLMOGOROV-SMIRNOV TEST ANALYSIS ===\n\n")
    
    if(is.null(data2)) {
        # One-sample test
        cat("ONE-SAMPLE K-S TEST\n")
        cat("H0: Data follows", dist, "distribution\n")
        cat("H1: Data does not follow", dist, "distribution\n\n")
        
        result <- ks.test(data1, dist, ...)
        print(result)
        
        # Additional statistics
        n <- length(data1)
        cat("\nSample size:", n, "\n")
        cat("Theoretical distribution:", dist, "\n")
        
    } else {
        # Two-sample test
        cat("TWO-SAMPLE K-S TEST\n")
        cat("H0: Both samples come from the same distribution\n")
        cat("H1: Samples come from different distributions\n\n")
        
        result <- ks.test(data1, data2)
        print(result)
        
        # Additional statistics
        n1 <- length(data1)
        n2 <- length(data2)
        cat("\nSample sizes: n1 =", n1, ", n2 =", n2, "\n")
        cat("Combined sample size:", n1 + n2, "\n")
    }
    
    return(result)
}

# Run analyses
comp_result1 <- comprehensive_ks_analysis(lifetimes, dist = "pexp", rate = 1/100)
## === KOLMOGOROV-SMIRNOV TEST ANALYSIS ===
## 
## ONE-SAMPLE K-S TEST
## H0: Data follows pexp distribution
## H1: Data does not follow pexp distribution
## 
## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  data1
## D = 0.47795, p-value = 0.004841
## alternative hypothesis: two-sided
## 
## 
## Sample size: 12 
## Theoretical distribution: pexp
comp_result2 <- comprehensive_ks_analysis(method_A, method_B)
## === KOLMOGOROV-SMIRNOV TEST ANALYSIS ===
## 
## TWO-SAMPLE K-S TEST
## H0: Both samples come from the same distribution
## H1: Samples come from different distributions
## 
## 
##  Exact two-sample Kolmogorov-Smirnov test
## 
## data:  data1 and data2
## D = 0.16667, p-value = 0.9923
## alternative hypothesis: two-sided
## 
## 
## Sample sizes: n1 = 12 , n2 = 12 
## Combined sample size: 24

11 Confidence Intervals for Quantiles

Let \(X\) be a random variable of the continuous type with pdf \(f(x)\) and cdf \(F(x)\).

Let \(p\) denote a positive proper fraction and assume that the equation \(F(x)=P\) \[ F(x)=P(X\leq x)=p \] has a unique solution for \(x\). this unique root is denoted by \(\varepsilon_P\) and is called the quantile (of the distribution) of order \(P\). Therefore, \[ P(X\leq \varepsilon_p)=F(\varepsilon_p)=p \]

The quantile of order \(\frac{1}{2}\) is the median of the distribution and \[ P(X\leq \varepsilon_0.05)=F(\varepsilon_0.05)=0.05 \]

To obtain a distribution free confidence interval for \(\varepsilon_p\), the quantile of order \(P\) of a distribution of the continuous type with distribution function \(F(x)\) take a random sample \(X-1, X_2, ... ,X_n\) of size \(n\) from that distribution.

Let \(Y_1<Y_2< ...<Y_n\) be the order statistics of the sample.

Take \(Y_i<Y_j\) and consider the event \(Y_i<\varepsilon_p<Y_j\) for the \(i^{th}\) order statistic to be less than \(\varepsilon_p\). It must be true that at least \(i^{th}\) of the \(X\) values are less than \(\varepsilon_p\). Moreover, for the \(j^{th}\) order statistic to be greater than \(\varepsilon_p\), fewer than \(j\) of the values are less than \(\varepsilon_p\).

\[ P\left[Y_i< \varepsilon_p <Y_j\right]=\sum_{r=1}^{j-1} \frac{n!}{r!(n-r)!}p^r(1-p)^{n-r} \]

is the probability of having at least \(i\) but less than \(j\) of then \(X_{i}^{'}s\) being less than \(\varepsilon_p\)

Suppose that \[ \alpha=P\left[Y_i< \varepsilon_p<Y_j\right] \]

then the probability is \(\alpha\) that the random interval \((Y_i,Y_j)\) includes the quantile of order \(p\).

If the experiment value of \(Y_i\) and \(Y_j\) are respectively \(y_i\) and \(y_j\), the interval \((Y_i,Y_j)\) serves as a \(100\alpha\) percent confidence interval (CI) for\(\varepsilon_p\), the quantile of order \(p\).

A confidence interval for a quantile is an interval estimate that, with a specified confidence level, contains the true population quantile. Unlike parametric confidence intervals for means, quantile CIs are distribution-free and based on order statistics.

Key Concepts:

Quantile: The value below which a given percentage of observations fall

Order statistics: The sorted sample values \[ X(1) \leq X(2) \leq \cdots \leq X(n) \]

Distribution-free: No assumptions about the underlying distribution

Exact coverage: Based on binomial probabilities

Common Quantiles:

Median: 50th percentile (q = 0.5)

Quartiles: 25th, 50th, 75th percentiles

Percentiles: Any value between 0 and 1

Methods:

Exact binomial method (most common)

Normal approximation (for large samples)

11.1 Solved Example

Let \(X-1, X_2, X_3 and X_4\) be a random sample of size \(4\) from a distribution of the continuous type.

Let \(Y_1<Y_2<Y_3<Y_4\) be the corresponding order statistics.

Required

Calculate \[P\left[Y_1< \varepsilon_{0.5}<Y_4\right]\]

Solution

We know that \[P\left[Y_i< \varepsilon_p<Y_j\right]=\sum_{r=1}^{j-1} \frac{n!}{r!(n-r)!}p^r(1-p)^{n-r}\]

\[P\left[Y_1< \varepsilon_{0.5} <Y_4\right]=\sum_{r=1}^{4-1} \frac{4!}{r!(4-r)!}0.5^r(1-0.5)^{4-r}=\]

\[P\left[Y_1< \varepsilon_{0.5} <Y_4\right]=\left(\frac{24}{1!3!}*0.5^4\right)+\left(\frac{24}{2!2!}*0.5^4\right)+\left(\frac{24}{3!1!}*0.5^4\right)\]

\[P\left[Y_1< \varepsilon_{0.5} <Y_4\right]=\frac{1}{4}+\frac{3}{8}+\frac{1}{4}=0.875\]

If \(Y_1\) and \(Y_4\) are observed to be \(Y_1=2.8\) and \(Y_4=4.2\) respectively then the interval \((2.8,4.2)\) is an 87.5 percent CI for the median \(\varepsilon_{0.5}\) of the distribution.

\[P\left[Y_i< \varepsilon_p<Y_j\right]=\sum_{r=1}^{j-1} \frac{n!}{r!(n-r)!}p^r(1-p)^{n-r}\]

For samples that are fairly large, we can use the normal approximation distribution to approximate to binomial probability.

Let \(n=27\) and we want a confidence interval for \(\varepsilon_p\), we calculate \[np=27*0.25=6.75\simeq 7\]

ie the \(7^{th}\) order statistics \(Y_7\) is a point estimation of \(\varepsilon_{0.25}\) We consider two order statistics one less than \(Y_7\) and another greater than \(Y_7\).

Let us take \(Y_4\) and \(Y_10\) we calculate the probability \(\alpha\) than the interval \((Y_4, Y_10)\) includes 0.25.

\[\alpha=P\left[Y_4< \varepsilon_p<Y_10\right]=\sum_{r=4}^{9} \frac{27!}{r!(27-r)!}0.25^r(0.75)^{27-r}\]

If \(R\) is the number of observation less than \(\varepsilon_p\), then

\[\alpha=P\left[Y_4<R<Y_10\right]=P\left[Y_3.5<R<Y_9.5\right]\]

But \[R ~ b(27,0.25)=b(27,\frac{1}{4})\]

\[E(R)=np=27*0.25=6.75=\frac{27}{4}\]

\[Var(R)=npq=27*0.75*0.25=\frac{81}{16}\]

\[\alpha=P\left[\frac{(3.5-6.75)}{\sqrt{\frac{81}{16}}}<Z<\frac{(9.5-6.75)}{\sqrt{\frac{81}{16}}}\right]=P\left[ \frac{-13}{9}<Z<\frac{11}{9}\right]\]

\[\alpha=\phi(\frac{11}{9})-\phi(\frac{-13}{9})=0.814\]

The appropriate order statistics \(Y_i\) and \(Y_j\) selected such that \(i\) and \(J\) are fairly symmetrically located about \(np\).

11.2 Solved Example : Manual Computation for Median

Construct a 90% confidence interval for the median from the following sample:

Data: 12, 15, 18, 20, 22, 25, 28, 30, 35, 40

Solution

Step 1: Sort the Data

Sorted: 12, 15, 18, 20, 22, 25, 28, 30, 35, 40 n = 10

Step 2: Determine Order Statistics

We need to find indices r and s such that:

\[ P\left( X_{(r)} \leq \xi_{0.5} \leq X_{(s)} \right) \geq 0.90 \]

Here:

  • \(\xi_{0.5}\) is the population median.
  • \(X_{(r)}\) and \(X_{(s)}\) are the \(r\)-th and \(s\)-th order statistics from a sample of size \(n\).
  • The inequality represents a confidence interval for the median with at least 90% confidence.

Step 3: Use Binomial Probabilities

For the median (p=0.5), the number of observations ≤ ξ₀.₅ follows Binomial(n, 0.5).

We want

\[ P(r \leq k \leq s - 1) \geq 0.90 \]

where

\[ k \sim \text{Binomial}(n = 10, p = 0.5) \]

Step 4: Find Appropriate Indices

From binomial tables or calculations:

\(P(3 \leq k \leq 7)\)

\[ \begin{align*} P(k \leq 2) &= 0.0547 \\ P(k \leq 7) &= 0.9453 \\ P(3 \leq k \leq 7) &= P(k \leq 7) - P(k \leq 2) = 0.9453 - 0.0547 = 0.8906 \end{align*} \]

This yields a confidence level of approximately 89%.

Try:

\(P(2 \leq k \leq 8)\)

\[ \begin{align*} P(k \leq 1) &= 0.0107 \\ P(k \leq 8) &= 0.9893 \\ P(2 \leq k \leq 8) &= P(k \leq 8) - P(k \leq 1) = 0.9893 - 0.0107 = 0.9786>0.90 \end{align*} \]

This yields a confidence level of approximately 97.86%, which satisfies the requirement.

11.3 Conclusion

To achieve at least 90% confidence that the population median lies within the interval, we select:

\[ r = 2,\quad s = 9 \]

Thus, the 90% distribution-free confidence interval for the median is:

\[ [X_{(2)}, X_{(9)}]=[15,35] \] Step 6: Verify Coverage

The interval [15, 35] contains the true median with approximately 97.86% confidence.

11.4 Solved Example : Manual Computation for 75th Percentile

Construct a 95% confidence interval for the 75th percentile from:

Data: 8, 12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38

Solution

Step 1: Sort the Data

Sorted: 8, 12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38

n = 12

Step 2: Determine Order Statistics

For q = 0.75 (75th percentile), we use binomial distribution with p = 0.75.

We want indices r and s such that:

\[ P\left( X_{(r)} \leq \xi_{0.75} \leq X_{(s)} \right) \geq 0.95) \]

where

\[ k \sim \text{Binomial}(n = 12, p = 0.75) \]

\[ P(r \leq k \leq s - 1)\geq 0.95) \]

Calculate cumulative probabilities:

P(k ≤ 6) = 0.0544

P(k ≤ 7) = 0.1576

P(k ≤ 8) = 0.3512

P(k ≤ 9) = 0.6488

P(k ≤ 10) = 0.8816

P(k ≤ 11) = 0.9844

Find interval with probability ≥ 0.95:

\[P(7 ≤ k ≤ 11) = P(k ≤ 11) - P(k ≤ 6) = 0.9844 - 0.0544 = 0.93 \] This is too low

\[P(6 ≤ k ≤ 11) = P(k ≤ 11) - P(k ≤ 5) = 0.9844 - 0.0142 = 0.9702 ≥ 0.95\]

Therefore, use r = 6, s = 12

Step 4: Construct Confidence Interval

95% CI for 75th percentile = \[[X_{(6)}, X_{(12)}] = [22, 38]\]

Step 5: Verify Coverage

The interval [22, 38] contains the true 75th percentile with approximately 97.02% confidence.

R Code Implementation

# Example 1: Median CI
data1 <- c(12, 15, 18, 20, 22, 25, 28, 30, 35, 40)

# Manual calculation function
quantile_ci <- function(data, q = 0.5, conf.level = 0.90) {
    n <- length(data)
    sorted_data <- sort(data)
    
    alpha <- 1 - conf.level
    
    # Find indices using binomial distribution
    prob_lower <- 0
    r <- 0
    s <- 0
    
    # Try different intervals to find one with coverage >= conf.level
    for(i in 1:n) {
        for(j in i:n) {
            coverage <- pbinom(j-1, n, q) - pbinom(i-1, n, q)
            if(coverage >= conf.level) {
                if(prob_lower == 0 || (j-i) < (s-r)) {
                    r <- i
                    s <- j
                    prob_lower <- coverage
                }
            }
        }
    }
    
    ci_lower <- sorted_data[r]
    ci_upper <- sorted_data[s]
    
    cat("Sample size: n =", n, "\n")
    cat("Quantile: q =", q, "\n")
    cat("Confidence level:", conf.level, "\n")
    cat("Indices: r =", r, ", s =", s, "\n")
    cat("Confidence interval: [", ci_lower, ", ", ci_upper, "]\n")
    cat("Exact coverage probability:", prob_lower, "\n")
    
    return(list(ci = c(ci_lower, ci_upper), indices = c(r, s), 
                coverage = prob_lower))
}

# Calculate for Example 1
result1 <- quantile_ci(data1, q = 0.5, conf.level = 0.90)
## Sample size: n = 10 
## Quantile: q = 0.5 
## Confidence level: 0.9 
## Indices: r = 2 , s = 8 
## Confidence interval: [ 15 ,  30 ]
## Exact coverage probability: 0.9345703
# Example 2: 75th percentile CI
data2 <- c(8, 12, 15, 18, 20, 22, 25, 28, 30, 32, 35, 38)
result2 <- quantile_ci(data2, q = 0.75, conf.level = 0.95)
## Sample size: n = 12 
## Quantile: q = 0.75 
## Confidence level: 0.95 
## Indices: r = 6 , s = 12 
## Confidence interval: [ 22 ,  38 ]
## Exact coverage probability: 0.9540709
# Using boot package for comparison
if(require(boot)) {
    # Bootstrap method
    quantile_boot <- function(data, indices, q) {
        return(quantile(data[indices], q))
    }
    
    boot_result1 <- boot(data1, statistic = quantile_boot, R = 1000, q = 0.5)
    boot_ci1 <- boot.ci(boot_result1, conf = 0.90, type = "perc")
    cat("\nBootstrap 90% CI for median:\n")
    print(boot_ci1)
    
    boot_result2 <- boot(data2, statistic = quantile_boot, R = 1000, q = 0.75)
    boot_ci2 <- boot.ci(boot_result2, conf = 0.95, type = "perc")
    cat("\nBootstrap 95% CI for 75th percentile:\n")
    print(boot_ci2)
}
## Loading required package: boot
## 
## Bootstrap 90% CI for median:
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot_result1, conf = 0.9, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 90%   (18, 30 )  
## Calculations and Intervals on Original Scale
## 
## Bootstrap 95% CI for 75th percentile:
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot_result2, conf = 0.95, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   (22.00, 35.75 )  
## Calculations and Intervals on Original Scale

11.5 Exercise & Assignment

  1. Let the order statistics of 27 observation in a random sample from a certain distribution of the continuous type be represented by the numbers

61 69 71 74 79 80 84 86 87 82 93 96 100 104 105 113 121 122 129 141 143 156 164 191 217 276

Required

  • Find a point estimate for \[π_{0.25}\]

  • Compute P(Y4< π0.25 <Y10)where π0.25 is the 25th percentile

  1. Let \[ Y1<Y2<…<Y20 \] denote the order statistics of a random sample of size 20 from a distribution of a continuous type.

Required: Compute \[P(Y_{12}<π_{0.7}<Y_{15})\]

  1. Find the smallest value of n for which \[ P(Y_1<π_{0.25}<Y_n )≥0.99 \] Where \(Y_1<Y_2<⋯<Y_n\) are the order statistics of a random sample of size n from a continuous distribution

  2. Find the smallest value of n for which \[ P(Y_1<π_{0.25}<Y_n )≥0.95 \] Where \(Y_1<Y_2<⋯<Y_n\) are the order statistics of a random sample of size n from a continuous distribution

  3. A researcher collected reaction times (ms) from 15 participants:

Data: 245, 289, 267, 312, 278, 295, 301, 256, 324, 288, 273, 299, 315, 282, 307

Required:

  • Construct a 95% confidence interval for the median manually

  • Construct a 90% confidence interval for the 25th percentile manually

  • Verify both intervals using R

  • Calculate exact coverage probabilities

  • Interpret the results

  1. A survey collected annual incomes (in $1000) from 20 households:

Data: 45, 52, 38, 60, 47, 55, 42, 51, 49, 53, 58, 62, 55, 65, 60, 57, 63, 59, 61, 58

Required:

  • Construct 95% CIs for the 25th, 50th, and 75th percentiles manually

  • Compare the widths of these intervals

  • Use bootstrap method to construct 95% CIs for the same quantiles

  • Compare exact binomial method vs bootstrap results

  • Write a comprehensive analysis report

R Code Template for Assignments

# Assignment Question 1 Template
reaction_times <- c(245, 289, 267, 312, 278, 295, 301, 256, 324, 288, 273, 299, 315, 282, 307)

# Enhanced quantile CI function
enhanced_quantile_ci <- function(data, q = 0.5, conf.level = 0.95) {
    n <- length(data)
    sorted_data <- sort(data)
    alpha <- 1 - conf.level
    
    cat("=== QUANTILE CONFIDENCE INTERVAL ===\n")
    cat("Sample size: n =", n, "\n")
    cat("Quantile: q =", q, "\n")
    cat("Confidence level:", conf.level, "\n\n")
    
    # Method 1: Exact binomial method
    cat("1. EXACT BINOMIAL METHOD:\n")
    
    best_interval <- NULL
    best_coverage <- 0
    best_width <- Inf
    
    for(i in 1:n) {
        for(j in i:n) {
            coverage <- pbinom(j-1, n, q) - pbinom(i-1, n, q)
            if(coverage >= conf.level) {
                width <- sorted_data[j] - sorted_data[i]
                if(is.null(best_interval) || width < best_width) {
                    best_interval <- c(i, j)
                    best_coverage <- coverage
                    best_width <- width
                }
            }
        }
    }
    
    if(is.null(best_interval)) {
        cat("No interval found with required coverage.\n")
        return(NULL)
    }
    
    r <- best_interval[1]
    s <- best_interval[2]
    ci_lower <- sorted_data[r]
    ci_upper <- sorted_data[s]
    
    cat("Indices: r =", r, ", s =", s, "\n")
    cat("Order statistics: X_(", r, ") =", ci_lower, ", X_(", s, ") =", ci_upper, "\n")
    cat("Confidence interval: [", ci_lower, ", ", ci_upper, "]\n")
    cat("Interval width:", best_width, "\n")
    cat("Exact coverage probability:", best_coverage, "\n\n")
    
    # Method 2: Normal approximation for large samples
    if(n > 20) {
        cat("2. NORMAL APPROXIMATION:\n")
        z <- qnorm(1 - alpha/2)
        r_approx <- floor(n * q - z * sqrt(n * q * (1 - q)))
        s_approx <- ceiling(n * q + z * sqrt(n * q * (1 - q)) + 1)
        
        r_approx <- max(1, min(r_approx, n))
        s_approx <- max(1, min(s_approx, n))
        
        ci_lower_approx <- sorted_data[r_approx]
        ci_upper_approx <- sorted_data[s_approx]
        
        cat("Approximate indices: r =", r_approx, ", s =", s_approx, "\n")
        cat("Approximate CI: [", ci_lower_approx, ", ", ci_upper_approx, "]\n\n")
    }
    
    return(list(
        exact_ci = c(ci_lower, ci_upper),
        exact_indices = c(r, s),
        exact_coverage = best_coverage,
        exact_width = best_width
    ))
}

# For median (50th percentile)
result_median <- enhanced_quantile_ci(reaction_times, q = 0.5, conf.level = 0.95)
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 15 
## Quantile: q = 0.5 
## Confidence level: 0.95 
## 
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 4 , s = 12 
## Order statistics: X_( 4 ) = 273 , X_( 12 ) = 307 
## Confidence interval: [ 273 ,  307 ]
## Interval width: 34 
## Exact coverage probability: 0.9648438
# For 25th percentile
result_q25 <- enhanced_quantile_ci(reaction_times, q = 0.25, conf.level = 0.90)
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 15 
## Quantile: q = 0.25 
## Confidence level: 0.9 
## 
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 2 , s = 8 
## Order statistics: X_( 2 ) = 256 , X_( 8 ) = 289 
## Confidence interval: [ 256 ,  289 ]
## Interval width: 33 
## Exact coverage probability: 0.9025194
# Assignment Question 2 Template
incomes <- c(45, 52, 38, 60, 47, 55, 42, 51, 49, 53, 58, 62, 55, 65, 60, 57, 63, 59, 61, 58)

# Multiple quantiles analysis
multiple_quantile_cis <- function(data, quantiles = c(0.25, 0.5, 0.75), conf.level = 0.95) {
    results <- list()
    
    cat("MULTIPLE QUANTILE CONFIDENCE INTERVALS\n")
    cat("=====================================\n\n")
    
    for(q in quantiles) {
        cat("Quantile: ", q*100, "th percentile\n", sep = "")
        result <- enhanced_quantile_ci(data, q = q, conf.level = conf.level)
        results[[paste0("q", q*100)]] <- result
        cat("\n")
    }
    
    return(results)
}

# Analyze all three quartiles
quartile_results <- multiple_quantile_cis(incomes)
## MULTIPLE QUANTILE CONFIDENCE INTERVALS
## =====================================
## 
## Quantile: 25th percentile
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 20 
## Quantile: q = 0.25 
## Confidence level: 0.95 
## 
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 2 , s = 10 
## Order statistics: X_( 2 ) = 42 , X_( 10 ) = 55 
## Confidence interval: [ 42 ,  55 ]
## Interval width: 13 
## Exact coverage probability: 0.961823 
## 
## 
## Quantile: 50th percentile
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 20 
## Quantile: q = 0.5 
## Confidence level: 0.95 
## 
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 6 , s = 15 
## Order statistics: X_( 6 ) = 51 , X_( 15 ) = 60 
## Confidence interval: [ 51 ,  60 ]
## Interval width: 9 
## Exact coverage probability: 0.9586105 
## 
## 
## Quantile: 75th percentile
## === QUANTILE CONFIDENCE INTERVAL ===
## Sample size: n = 20 
## Quantile: q = 0.75 
## Confidence level: 0.95 
## 
## 1. EXACT BINOMIAL METHOD:
## Indices: r = 11 , s = 19 
## Order statistics: X_( 11 ) = 57 , X_( 19 ) = 63 
## Confidence interval: [ 57 ,  63 ]
## Interval width: 6 
## Exact coverage probability: 0.961823
# Bootstrap comparison
if(require(boot)) {
    cat("BOOTSTRAP COMPARISON\n")
    cat("===================\n\n")
    
    for(q in c(0.25, 0.5, 0.75)) {
        quantile_boot <- function(data, indices, q) {
            return(quantile(data[indices], q))
        }
        
        boot_result <- boot(incomes, statistic = quantile_boot, R = 2000, q = q)
        boot_ci <- boot.ci(boot_result, conf = 0.95, type = "perc")
        
        cat(q*100, "th percentile bootstrap CI:\n", sep = "")
        cat("Lower:", boot_ci$percent[4], "Upper:", boot_ci$percent[5], "\n\n")
    }
}
## BOOTSTRAP COMPARISON
## ===================
## 
## 25th percentile bootstrap CI:
## Lower: 44.25 Upper: 55 
## 
## 50th percentile bootstrap CI:
## Lower: 51.5 Upper: 59.5 
## 
## 75th percentile bootstrap CI:
## Lower: 57 Upper: 62.25
# Visualization
plot_quantile_cis <- function(data, quantile_results) {
    sorted_data <- sort(data)
    n <- length(data)
    
    plot(1:n, sorted_data, type = "n", 
         xlab = "Order Statistics", ylab = "Data Values",
         main = "Quantile Confidence Intervals")
    
    points(1:n, sorted_data, pch = 19, col = "blue")
    
    colors <- c("red", "green", "purple")
    quantiles <- c(0.25, 0.5, 0.75)
    
    for(i in 1:length(quantiles)) {
        q <- quantiles[i]
        result <- quantile_results[[paste0("q", q*100)]]
        
        if(!is.null(result)) {
            r <- result$exact_indices[1]
            s <- result$exact_indices[2]
            
            segments(r, sorted_data[r], s, sorted_data[s], 
                     col = colors[i], lwd = 3)
            points(c(r, s), sorted_data[c(r, s)], 
                   col = colors[i], pch = 17, cex = 1.5)
        }
    }
    
    legend("topleft", 
           legend = c("Data", "Q1 CI", "Median CI", "Q3 CI"),
           col = c("blue", "red", "green", "purple"),
           pch = c(19, 17, 17, 17), lwd = c(NA, 3, 3, 3))
}

plot_quantile_cis(incomes, quartile_results)