Author

J Sigma

1. Introduction

1.1. Parametric Techniques

The statistical techniques we have used thus far are parametric techniques

ImportantParametric Techniques

Parametric Techniques are statistical techniques which

  • Assume that data follows a particular distribution

  • Assume that the data is normally distributed for large sample sizes by the central limit theorem

  • Typically rely on quantitative data

  • Via the assumption of distribution, sampling distributions of the test statistics are derived, and inferences are made about the unknown population parameters of the particular distribution

  • Heavy focus on parameters

WarningExample

We may attempt to find the mean height of first-year students at UCT and find that, for this data, the standard deviation of the heights is also unknown.

By collecting random samples, we the find sample means and sample standard deviations, and assume that the data will follow a \(t\)-distribution. We can then calculate a test-statistic for the mean height and use inference to find whether this is significant at some \(\alpha\) level.

This heavily depends on an assumption of normality.

1.2. Non-Parametric Techniques

ImportantNon-Parametric Techniques

Non-parametric techniques are statistical techniques which are valid for a wide variety of underlying distributions, because they

  • Only make weak assumptions about the distribution of the data

  • Do not depend on parameters specific to a particular distribution

As such, we use non-parametric techniques when

  • we have non-normal quantitative data or when the distribution of data is uncertain

  • when we have small samples

  • when we have qualitative data

Note

Non-parametric techniques can also be use for when data is normally distributed. So, they are not limited to non-normal data.

However, if we know the underlying distribution of a particular data, it is better to use parametric techniques since this gives us more power, i.e., the probability of correctly rejected a false null hypothesis.

So, non-parametric techniques are always valid, but they are sometimes not the optimal choice for power.

1.3. Data Types

We differentiate between qualitative and quantitative data

1.3.1. Qualitative (or Categorical) Data

ImportantDefinition (Qualitative Data)

Qualitative data refers to data that represents categories or labels, or levels of a factor. If the data is numbered, then the levels have no arithmetic meaning

Examples of categorical data may include gender, nationality, blood type, colours, and many more. The values of the categories here describe what something is, and not how much of it there is.

We further divide qualitative data into nominal and ordinal data

WarningNominal vs Ordinal Data

Nominal data refers to data which has categories that can be listed without any particular order, and this doesn’t change the meaning of the data. For example, we may measure the number of people belonging to blood groups A, B, AB, and O. Here, there is no notion of any blood group having a higher value than any other.

On the other hand, ordinal data refers to categorical data that has a clear order structure. For example, we may consider looking at the year level of undergraduate students in University. We will have categorical groups, 1st year, 2nd year, 3rd year, and so on, but there is a clear order to these groups.

1.3.2. Quantitative Data

ImportantDefinition (Quantitative Data)

Quantitative data represents measurable quantities, where numerical values have meaningful arithmetic interpretation.

Examples of categorical data includes things like height, income, time, number of lectures attended in a course. These are not just labels; the carry some magnitude meaning.

Similar to qualitative data, we further differentiate between interval and ratio-scaled quantitative data.

WarningInverval vs Ratio-Scaled Data

Ratio-scaled quantitative data refers to quantitative data where the \(0\) value has true meaning, in that it refers to an absence of the quantity being measured. Examples where this is the case is height, weight, and temperature when measured in Kelvin (\(0\)K indicates an absence of thermal energy), and duration.

Also, ratios between values for this type of data is meaningful. We may say things like, \(20\) kilograms is twice as heavy as \(10\) kilograms.

Interval quantitative data refers ti quantitative data where the zero value has no physical or real meaning. Here, the \(0\) value does not mean an absence of the quantity being measured. Examples include IQ score (where \(0\) IQ doesn’t mean an absence of intelligence, but just a lack thereof), time, and temperature when measured in degrees symbols (\(0^{\circ}C\) or \(0^{\circ}F\) do not indicate an absence of temperature).

Ratios between values are not meaningful here. For example, \(20^{\circ}C\) is not twice as hot as \(10^{\circ}C\). To show this, we can convert both to Kelvin (since it has a meaningful zero) and see what the percentage change is:

\(10^{\circ}C=283K\) and \(20^{\circ}=293K\). So, the change can be found by

\[ \frac{293-283}{293}=0,03412...\approx0,34 \]

This shows that \(20^{\circ}C\) is only about \(3,4\%\) hotter than \(10^{\circ}C\).

1.4. Overview of Non-Parametric Tests

1.4.1. Single Population Tests

Test Data Type Data
Tests for Randomness of Order Nominal Independent Observations
Chi-Square Goodness of Fit Test Nominal Independent Observations

1.4.2. Two Population Tests

Tests for Equality of Medians Data Type Data Parametric Test Equivalent
Wilcoxon Rank Sum (Mann-Whitney U) Test Ordinal or non-normal Quantitative Data Independent samples \(t\)-test
Wilcoxon Signed Rank Sum Test Non-normal quantitative data Matched/Paired samples matched pairs \(t\)-test
Sign Test Ordinal data Matched/Paired samples matched pairs \(t\)-test

1.4.3. Three or More Population Tests

Test Data Type Data Parametric Test Equivalent
Kruskal-Wallis Test Ordinal or non-normal quantitative data Independent samples One-Way ANOVA
Friedman Test Ordinal or non-normal quantitative data Matched/Blocked samples Two-Way ANOVA without interactions

1.4.4. For Relationship Between Two Variables

Test Data Type Data Parametric Test Equivalent
Spearman’s Rank Correlation Test Ordinal or non-normal Paired observations Pearson’s correlation

1.5. Ranking Data

Since non-parametric techniques rely on ranks instead of numerical frequencies of data, we need to understand how to rank data. We take the following steps:

  1. We begin by ranking data in some sequence (usually ascending)

  2. We assign ranks by identifying the relative position of each value in the ordered data

  3. We look out for ties

    If there are no ties in data values, we assign the relative position to the data values. However, if there are ties, we assign an average rank to the tied data values

WarningExample One

Suppose we are given the data values \(4,9,6,7,5,2,8\). Then, we would assign ranks as follows:

Data 4 9 6 7 5 2 8
Ordered 2 4 5 6 7 8 9
Relative Position 1 2 3 4 5 6 7

There are no ties in this data, and so we rank via the relative positions.

WarningExample Two
Data 29 18 29 19 20 21 20 33 30 23 33 33 24
Ordered 18 19 20 20 21 23 24 29 29 30 33 33 33
Relative Position 1 2 3 4 5 6 7 8 9 10 11 12 13
Ranks 1 2 3.5 3.5 5 6 7 8.5 8.5 10 12 12 12

We have 3 ties in this data

  • \(20\) and \(20\) \(\implies\) \(\displaystyle{\frac{3+4}{2}}=3.5\)

  • \(29\) and \(29\) \(\implies\) \(\displaystyle{\frac{8+9}{2}}=8.5\)

  • \(33\), \(33\), and \(33\) \(\implies\) \(\displaystyle{\frac{11+12+13}{3}}=12\)

2. Wilcoxon Signed Rank Sum Test

ImportantKey Idea

A Wilcoxon Signed Rank Sum Test is used for comparing two matched, dependent samples of quantitative data (interval or ratio) with respect to central location.

It tests whether these two samples come from the same population.

Recall that, in parametric tests, we had the privilege that when we had this kind of situation, we had that the data is located around the mean. So, we would perform a paired \(t\)-test, taking the difference in the means of the samples and seeing if there was any significant difference between the two samples.

Since, now, we only make weak assumptions about the distribution of the data, we look at the median as the central location of differences. This means that we compare the medians of the two samples to see if there is any signifcant difference between them.

2.1. Hypotheses

The null hypothesis, \(H_{0}\), is always an assumption of no significant difference between the sample medians of the two groups. For the alternative hypothesis, we can have a one-sided or a two-sided hypothesis. So,

\[ H_{0}: \text{ median of differences}=0 \text{ (i.e., no difference between samples)}\ \]

\[ \text{and} \]

\[ \\H_{1}:\text{median of differences} \neq 0 \text{ (i.e., there is a difference between samples)} \]

\[ \\H_{1}: \text{ median of differences}>0 \text{ (i.e., sample one has higher values than sample two)} \]

\[ \\H_{1}: \text{median of differences}<0 \text{ (i.e., sample two has higer values than sample one)} \]

Tip

Always give the null and alternative hypothesis in a way that references all the information given from the context in question. So, you need to make sure that your hypothesis are one-sided or two-sided based on the context, but also that the hypotheses are not general. If the question references finishing time of racers in a race, for example, then this should be evident in your hypotheses.

2.2. Data and Assumptions

  1. Two paired samples
  2. Quantitative data (interval or ratio)
  3. Under the assumption of \(H_{0}\), the paired differences are symmetric around the median
  4. The \(n\) paired differences are independent and random

2.3. Calculating the Test Statistic

ImportantTest Statistic (Wilcoxon Signed Rank Sum Test)

We take the following steps:

  1. Begin by calculating the differences for each pair

  2. Exclude the pairs with a difference of \(0\).

  3. Record \(n\). This is the number of non-zero differences.

  4. Record the sign of the pair differences.

  5. Record the sign of the paired differences.

  6. Rank the absolute values of the differences.

  7. The test statistic is the given by

    \[ W=\text{Sum of the Signed Ranks} \]

Question: Why does this work?

Answer: Under the assumption that \(H_{0}\) is true, the differences will be randomly distributed around the median. If we take the ranks of the differences, then, each rank is equally likely to be \(+\) or \(-\).

Roughly, the positive and negative ranks will cancel out, and so we obtain that

\[ W\approx0 \]

So, the question of whether the positive and negative differences balanced in a symmetric way round 0 (i.e., \(\text{median of differences}=0\)) is answered this way.

Question: Why, though, are we opposed to taking the numerical differences and seeing if there is a true difference between the two samples?

Answers: That is what a paired t-test does. However, it assumes that the differences come from a normal distribution. So, we want to use a method or test that is going to capture this notion, without the assumption that the data follows a normal distribution.

WarningWorked Example Part I: (The Placebo)

Before studying, a group of \(6\) students are told that they are trying a new drink that supposedly improves concentration.

In reality, the drink is just flavoured water.

Each student writes a short test:

  • Before drinking it; and

  • After drinking it

and their test scores are recorded.

Person Before After
1 80.0 78.6
2 73.5 76.0
3 85.0 81.2
4 69.0 74.1
5 77.8 78.5
6 90.2 86.0

Is there consistent evidence that the drink improved performance at all?

Note: For now, we are only focused on how we obtain the test statistic, and not necessarily how we would conduct the whole hypothesis test

Person Before After

\(d_{i}\)

(Differences)

\(|d_{i}|\)

(Absolute Differences)

Ordered Rank Sign Signed Ranks
1 \(80.0\) \(78.6\) \(+1.4\) \(1.4\) \(0.7\) \(1\) \(-\) \(-1\)
2 \(73.5\) \(76.0\) \(-2.5\) \(2.5\) \(1.4\) \(2\) \(+\) \(+2\)
3 \(85.0\) \(81.2\) \(+3.8\) \(3.8\) \(2.5\) \(3\) \(-\) \(-3\)
4 \(69.0\) \(74.1\) \(-5.1\) \(5.1\) \(3.8\) \(4\) \(+\) \(+4\)
5 \(77.8\) \(78.5\) \(-0.7\) \(0.7\) \(4.2\) \(5\) \(+\) \(+5\)
6 \(90.2\) \(86.0\) \(+4.2\) \(4.2\) \(5.1\) \(6\) \(-\) \(-6\)

Note: The ordered absolute differences are shuffled. So, they are not necessarily associated with the people in the columns. When finding the signed ranks, you need to associate them appropriately so that the signs are correct.

We find the test statistic as

\[ W=-1+2-3+4+5-5=+1 \]

We see, here, that the \(+\) and \(-\) signs are scattered across the small and large ranks, and the result is that \(W=1\). This is quite close to zero. So, we may infer that there is no significant evidence to reject the null hypothesis of a difference in performance. We may conclude that the placebo did not really work.

WarningWorked Example Part II: (The Placebo)

A second group of students takes the same focus booster drink before writing a similar test. Again, the scores are recorded:

  • Before drinking; and

  • After drinking

Person Before After \(d_{i}\) \(|d_{i}|\) Order Rank Sign Signed Ranks
1 \(79.3\) \(82.5\) \(-3.2\) \(3.2\) \(1.1\) \(1\) \(+\) \(+1\)
2 \(69.1\) \(68.0\) \(1.1\) \(1.1\) \(2\) \(2\) \(-\) \(-2\)
3 \(85.4\) \(91.2\) \(-5.8\) \(5.8\) \(3.2\) \(3\) \(-\) \(-3\)
4 \(73.0\) \(75.0\) \(-2\) \(2\) \(4.5\) \(4\) \(-\) \(-4\)
5 \(83.8\) \(88.3\) \(-4.5\) \(4.5\) \(5.8\) \(5\) \(-\) \(-5\)
6 \(62.8\) \(70.1\) \(-7.3\) \(7.3\) \(7.3\) \(6\) \(-\) \(-6\)

And we find that our test statistic is

\[ W=1-2-3-4-5-6=-19 \]

This differs vastly from zero, and tells us that the test scores after drinking the focus booster are bigger than the values before drinking the focus booseter.

2.4. So, What is \(W\) Really Measuring?

\(W\) tries to determine whether the signs are distributed randomly across the ranks, or is there a pattern/skew to one particular side. We have the following:

  • \(W\approx 0 \implies \text{no evidence of a patern -- signs are random}\)

  • \(W>>0 \implies \text{first sample (before) tends to be larger}\)

  • \(W<<0 \implies \text{second sample (after) tends to be larger}\)

So, in the test for a significant difference, we ask how extreme \(W\) is under \(H_{0}\), i.e., how different \(W\) is from zero.

2.5. Sampling Distribution of \(W\)

Note

The sampling distribution is the distributionof a statistic (in this case, \(W\)) over all the possible samples (or outcomes) under a given assumption.

For \(W\), the sampling distribution is given by all the possible ways of assigning \(+\) or \(-\) signs to the ranks since we assume, under the null hypothesis, that the signs of the differences are random, i.e., \(\text{median of differences}=0\)

Under \(H_{0}\), the signs of the ranked differences are random, so each rank is equally likely to be positive or negative. It turns out that, for small sample sizes with no ties, it is possible to deduce the properties of the sampling distribution through simple enumeration of all possibilities

WarningExample

Suppose that a data set has \(3\) values. Then, we will get \(3\) ranks from this data set. Let these ranks be \(1,2, \text{ and }3\).

Each rank can take on a \(+\) or \(-\) sign. So, the total number of combinations of \(+\) and \(-\) signs is going to be given by

\[ 2^{3}=8 \] The following table shows how we can get this:

1 2 3 W
\(+\) \(+\) \(+\) \(1+2+3=6\)
\(-\) \(+\) \(+\) \(-1+2+3=5\)
\(+\) \(-\) \(+\) \(+3\)
\(+\) \(+\) \(-\) \(0\)
\(-\) \(-\) \(+\) \(0\)
\(-\) \(+\) \(-\) \(-2\)
\(+\) \(-\) \(-\) \(-4\)
\(-\) \(-\) \(-\) \(-6\)

We can then take the proportion of each of the values for \(W\) across the whole group to get the sampling distribution of \(W\). We get the following:

Code
#################################
# SAMPLING DISTRIBUTION OF W
#################################

W <- c(-6, -4, -2, 0, 2, 4, 6)
prob <- c(1, 1, 1, 2, 1, 1, 1) / 8


barplot(prob,
        names.arg = W,
        xlab = "W",
        ylab = "Proportion",
        main = "Proportion of Signed Differences")

You can see how this can get out of hand for larger sample sizes since, say, \(9\) ranks, will lead to \(2^{9}=512\) values for \(W\). In this case, it may be useful to use R. Here is an example of a code that may help you perform this:

Code
##############################################
# SAMPLING DISTRIBUTION FOR W WITH 9 RANKS
##############################################

# Function to compute sampling distribution of W
wilcoxon_W_dist <- function(n) {
  ranks <- 1:n
  
  # Generate all ±1 combinations
  signs <- expand.grid(rep(list(c(-1, 1)), n))
  
  # Compute W = sum of positive ranks
  W <- apply(signs, 1, function(s) sum(ranks[s == 1]))
  
  # Convert to probability distribution
  dist <- table(W) / length(W)
  
  return(dist)
}

# Case n = 9
dist9 <- wilcoxon_W_dist(9)

# Plotting the distribution

barplot(dist9,
        xlab = "W",
        ylab = "Proportion",
        main = "Sampling Distribution of W (n = 9)")

Notice that as we increase the number of ranks \(n\) increases, the sampling distribution becomes more symmetric and smooth. This suggest that for larger values of \(n\), the sampling distribution of \(W\) resembles a normal distribution.

In fact, for large sample sizes (\(n>10\)), the sampling distribution of \(W\) can be approximated by a normal distribution with

  • a mean of \(\mu_{W}=0\); and

  • a standard deviation of \(\sigma_{W}=\displaystyle{\frac{n(n+1)(2n+1)}{6}}\)

For this test, we can also have, either a

  1. two-sided test, and we reject \(H_{0}\) if \(|z|>z_{\frac{\alpha}{2}}\); or
  2. a one-sided test, and we reject \(H_{0}\) if \(z>z_{\alpha} \text{ (right-tailed)}\) or \(z<-z_{\alpha} \text{ (left-tailed)}\)

We could also use a \(p\)-value approach whereby we find the \(p\)-value corresponding to the calculated test statistic. In this case, we reject \(H_{0}\) if \(p<\alpha\), for a given significance level \(\alpha\).

WarningWorked Example (A case of larger values of n)

In the following, we are trying to answer the question of whether a “flexi-time” work schedule helps to reduce the travel time of workers

Note: Your brain should immediately be notifying you that this will be a one-sided test since we are looking for a reduction in the variable of interest

A random sample of \(32\) workers was selected, and workers recorded their time in minutes before and after the program was implemented.

Using the modified \(p\)-value approach, test at the \(5\%\) significance level. The full Wilcoxon table is given below

Code
#########################
# DATA FOR FLEXI-TIME
#########################

data <- data.frame(
  Worker = 1:32,
  normal_arrival = c(34,35,43,46,16,26,68,38,61,52,68,13,69,18,53,18,
            41,25,17,26,44,30,19,48,29,24,51,40,26,20,19,42),
  Flextime = c(31,31,44,44,15,28,63,39,63,54,65,12,71,13,55,19,
               38,23,14,21,40,33,18,51,33,21,50,38,22,19,21,38)
)


library(dplyr)
library(gt)

data %>%
  mutate(
    difference = normal_arrival - Flextime,
    abs_difference = abs(difference)
  ) %>%
  filter(difference != 0) %>%
  mutate(
    rank = rank(abs_difference, ties.method = "average"),
    signed_rank = rank * sign(difference)
  ) %>%
  gt()
Worker normal_arrival Flextime difference abs_difference rank signed_rank
1 34 31 3 3 21.0 21.0
2 35 31 4 4 27.0 27.0
3 43 44 -1 1 4.5 -4.5
4 46 44 2 2 13.0 13.0
5 16 15 1 1 4.5 4.5
6 26 28 -2 2 13.0 -13.0
7 68 63 5 5 31.0 31.0
8 38 39 -1 1 4.5 -4.5
9 61 63 -2 2 13.0 -13.0
10 52 54 -2 2 13.0 -13.0
11 68 65 3 3 21.0 21.0
12 13 12 1 1 4.5 4.5
13 69 71 -2 2 13.0 -13.0
14 18 13 5 5 31.0 31.0
15 53 55 -2 2 13.0 -13.0
16 18 19 -1 1 4.5 -4.5
17 41 38 3 3 21.0 21.0
18 25 23 2 2 13.0 13.0
19 17 14 3 3 21.0 21.0
20 26 21 5 5 31.0 31.0
21 44 40 4 4 27.0 27.0
22 30 33 -3 3 21.0 -21.0
23 19 18 1 1 4.5 4.5
24 48 51 -3 3 21.0 -21.0
25 29 33 -4 4 27.0 -27.0
26 24 21 3 3 21.0 21.0
27 51 50 1 1 4.5 4.5
28 40 38 2 2 13.0 13.0
29 26 22 4 4 27.0 27.0
30 20 19 1 1 4.5 4.5
31 19 21 -2 2 13.0 -13.0
32 42 38 4 4 27.0 27.0

We have the following hypotheses:

\(H_{0}: \text{There is no difference in the travel time to work the normal and Flexi-time work programs}\)

\(H_{1}:\text{Workers take longer to travel to work in normal work hours}\)

and we are given a significance level of \(\alpha=0.05\)

Here, \(n=\text{number of non-zero differences}=32>10\). So, we can safely assume that the data will follow a normal distribution. We calculate the test statistic as

\[ W=\sum_{i=1}^{32} \text{rank}(d_{i})\cdot \text{sgn}(d_{i})=207 \]

We can then calculate the \(z\)-score associated with this test statistic as

\[ z=\frac{W-\mu_{W}}{\sigma_{W}}=\frac{207-0}{\sqrt{\frac{(32)(33)(65)}{6}}}=1.935 \]

Note: Under the assumption that \(H_{0}\) is true, we have that \(\mu_{W}=0\)

We can then find the \(p\)-value of the test statistic. This is a left-handed test, since we were looking at a reduction in the time taken to arrive. So, we expect that \(d>0\). Using this understanding, we can calculate the \(p\)-value using R

Code
#############################
# FINDING P-VALUE
#############################

p <- pnorm(1.935, lower.tail=F)

p
[1] 0.02649515

Conclusion:

Since the \(p\)-value is less than \(0.05\), we reject the null hypothesis. We then conclude that there is significant evidence that workers take longer to travel in the normal work-hour program than they do with a Flexi-time schedule. The median difference is greater than zero.

3. Mann-Whitney-U Test

ImportantKey Idea

The Mann-Whitney-U Test (or U Test, Wilcoxon Rank Sum Test, or just Rank Sum Test) is used to determine whether two independent samples of ordinal or quantitative data have the same central location (median).

This test is the equivalent of the \(t\)-test for two samples of normal data.

3.1. Data and Assumptions

  1. We have two random samples of size \(n_{1}\) and \(n_{2}\)
  2. The data are either ordinal or quantitative, but not normal
  3. Samples and observations within samples are independent
  4. The distributions of the two populations differ with respect to location only (if they differ at all)

3.2. Hypothesis Testing for Mann-Whitney-U Tests

3.2.1. Hypotheses

We differentiate between one-sided and two-sided hypotheses. So:

For a two-sided test:

\[ H_{0}:\text{the two population [medians] are the same} \]

\[ \text{and} \]

\[ H_{1}: \text{the two population medians are different} \]

For a one-sided test:

\[ H_{0}:\text{the two population [medians] are the same} \]

\[ \text{and} \]

\[ H_{1}: \text{the location of the first population is to the right of the second population} \]

\[ H_{1}:\text{the location of the first population is to the left of the second population} \]

3.2.2. Calculating the Test Statistic

The test statistic here depends on \(n_{1}\) and \(n_{2}\). We find it in the following way:

  1. Combine the two samples into a single set of values

  2. Rank all observations from the smallest to largest, i.e., from \(1\) to \(n_{1}+n_{2}\)

  3. Calculate the sum of the ranks, \(T_{1}=\text{sum of ranks for } n_{1}\) and \(T_{2}=\text{sum of ranks for } n_{2}\)

  4. We calculate two statistics:

    \[ U_{1}=T_{1}-\frac{n_{1}(n_{1}+1)}{2} \]

    \[ \text{and} \]

    \[ U_{2}=T_{2}-\frac{n_{2}(n_{2}+1)}{2} \]

  1. The final test statistic is given by

    \[ U=\text{min}(U_{1}, U_{2}) \]

    and we relate it to the specific \(T\). So, if \(\text{min}(U_{1},U_{2})=U_{1}\), then \(T=T_{1}\) will be the test statistic.

3.2.3 Conclusion: The Logic

If the locations of the two populations are are about the same, we would expect the sum of ranks \(T_{1}\) and \(T_{2}\) to be close, and therefore expect that the ranks are evenly spread between the samples.

If \(T_{1}\) is sufficiently small, then most of the smaller observations are in population \(1\). We then conclude that the location of population \(1\) is to the left of population \(2\), and reject \(H_{0}\).

On the other hand, if \(T_{1}\) is sufficiently larger, the most of the larger observations are in population \(1\). We conclude, therefore, that the location of population \(1\) is to the right of population \(1\).

WarningWorked Example

Suppose we have the following samples:

\(\text{Sample 1}=\{0, 1, 1,0,1,2,1,2,3\}\)

\(\text{Sample 2}=\{7, 9, 10, 8, 10, 11, 10, 11, 12\}\)

We can combine the two samples into one set of values and rank the new set of values.

0 0 1 1 1 1 2 2 3 7 8 9 10 10 10 11 11 12
1.5 1.5 4.5 4.5 4.5 4.5 7.5 7.5 9 10 11 12 14 14 14 16.5 16.5 18

Without even calculating the test statistic, we can see that the ranks of sample 2 are much larger than those of sample 1. We can concretely show this. We get

\[ T_{1}=45 \quad \text{and} \quad T_{2}=126 \]

and so we obtain that

\[ U_{1}=0 \quad U_{2}=81 \]

Clearly, then, the test statistic is going to be

\[ T=45 \]

since we have small sample sizes (\(n_{1}, n_{2}<10\)), we use the Man-Whitney table to find the rejection region. We use \(\alpha=0\) for this case.

We define \(T_{L}\). This value can be obtained from the table with the appropriate \(\alpha\) level, as the intersection of the two sample sizes (for small samples). In this case

\[ T_{L}=63 \]

We define \(T_{U}=n_{1}(n_{1}+n_{2}+1)-T_{L}\). In our case, we get that

\[ T_{U}=(9)(9+9+1)-63=108 \]

We reject \(H_{0}\) if \(T \leq T_{L}\) or \(T \geq T_{U}\).

In our case, we reject the null hypothesis since \((T=45) \leq (T_{L}=63)\). We then conclude that the location of sample one is to the left of the location of sample two. Most of the observations in sample one are smaller than the observations in sample two.

For large sample sizes, whereby \(n_{1} \text{ or } n_{2}\) are bigger than zero (in the inclusive sense), then the sample distribution of the test statistic can be approximated by a normal distribution.

NoteMann-Whitney-U Test for Large Sample Sizes

Since \(T\) is approximated by a normal distribution for large sample sizes, we can standardise \(T\) to obtain a \(z\) score:

\[ z=\frac{T-\mu_{T}}{\sigma_{T}} \]

where

\[ \mu_{T}=\frac{n_{1}(n_{1}+n_{2}+1)}{2} \quad \text{and} \quad \sigma_{T}=\sqrt{\frac{n_{1}n_{2}(n_{1}+n_{2}+1)}{12}} \]

Then, we reject the null hypothesis if:

  • \(|z| \geq z_{\alpha/2}\) for a two-sided test

  • \(z>z_{\alpha}\) for a right-tailed test

  • \(z<-z_{\alpha}\) for a left-tailed test

  • \(p \leq \alpha\)

Sometimes, the values of \(n_{1}\) and \(n_{2}\) are not going to match. Given the nature in which we calculate the test statistic for this test, this is not a problem. We still proceed as we have established.

WarningWorked Example Two

The ABC Company has sent \(13\) of its employees to a privately-ran programme providing word-processing skills training. Six of the employees were from the data-processing (DP) department, and the rest where from the Typing (T) pool.

At the end of the programme, the company received a report indicating the score receieved by each of the employees out of a total possible score of \(100\).

We have the following:

DP T
70 59
52 70
46 75
65 85
60 50
40 82
64

Is there a difference in the performance of the two groups in the word-processing programme? Test at a \(5\%\) significance level.

We state the null and alternative hypotheses as

\[ H_{0}: \text{There is no diifference in the performance between the two groups} \]

\[ \text{and} \]

\[ H_{1}:\text{There is a difference in performance between the two groups} \]

We are given that \(\alpha=0.05\). We then combine the two samples for ranking. This gives us the following:

Data  70 52 46 65 60 40 59 70 75 85 50 82 64
Ordered 40 46 50 52 59 60 64 65 70 70 75 82 85
Rank 1 2 3 4 5 6 7 8 9.5 9.5 11 12 13

We find, from this, that \(T_{1}=30.5\) and \(T_{2}=60.5\). Then,

\[ U_{1}=30.5-\frac{6(7)}{2}=9.5 \quad \text{and} \quad U_{2}=60.5-\frac{7(8)}{2}=32.5 \]

This gives the test statistic as

\[ T=30.5 \]

To calculate the test statistic, we note that \(n_{1}=6\) and \(n_{2}=7\). Since these are both less than \(10\), we can obtain \(T_{L}\) using the Mann-Whitney-U table. We get that

\[T_{L}=28\]Then,

\[ T_{U}=6(6+7+1)-28=56 \]

Conclusion: We find that \(T_{L} \leq T \leq T_{U}\). So, we fail to reject the null hypothesis, and conclude that there is no evidence pf a significant difference in performance between the two groups in the word-processing skills training programme.

WarningWorked Example Three

A pharmaceutical company is planning to introduce a new painkiller. To determine the effectiveness of the drug in comparison to asprin, \(30\) people were randomly selected.

  • \(15\) people were given the new drug (Sample \(1\))

  • \(15\) people were given asprin (Sample \(2\))

Each participant was asked to indicate which one of the five statements best represented the effectiveness of the drug they took. The statements are as follows:

The drug taken was…

  • (5) Extremely effective

  • (4) Quite effective

  • (3) Somewhat effective

  • (2) Slighly effective

  • (1) Not effective

Note: This is ordinal data

The ratings were recorded as follows

New Drug Asprin
3 4
5 1
4 3
3 2
2 4
5 1
1 3
4 4
5 2
3 2
3 2
5 4
5 3
5 4
4 5

At the \(5\%\) significance level, is the new drug perceived to be more effective than asprin?

As usual, we start with the null and alternative hypotheses:

\[ H_{0}:\text{there is no difference in the perceived effectiveness between the two painkillers} \]

\[ H_{1}: \text{there is a diffeences between the painkillers} \]

We are given a \(5\%\) significance level. We notice that \(n_{1},n_{2}>10\), and so the sampling distribution of the test statistic follows a normal distribution. We find the test statistic \(T\) first:

Data Ordered Rank
3 1 2
5 1 2
4 1 2
3 2 6
2 2 6
5 2 6
1 2 6
4 2 6
5 3 12
3 3 12
3 3 12
5 3 12
5 3 12
5 3 12
4 3 12
4 4 19.5
1 4 19.5
3 4 19.5
2 4 19.5
4 4 19.5
1 4 19.5
3 4 19.5
4 4 19.5
2 5 27
2 5 27
2 5 27
4 5 27
3 5 27
4 5 27
5 5 27

and so we obtain that \(T_{1}=276.5\) and \(T_{2}=188.5\). This gives us our test statistic as

\[ T=276.5 \]

Before finding the \(z\)-score, we calculate

\[ \mu_{T}=\frac{(15)(15+15+1)}{2}=232.5 \]

and

\[ \sigma_{T}=\sqrt{\frac{(15)(15)(15+15+1)}{12}}\approx24.11 \]

and so we obtain that

\[ z=\frac{276.5-232.5}{24.11}=1.82 \]

This is a one-sided test, and so we will reject \(H_{0}\) if the test statistic is greater than

Code
####################
# CRITICAL VALUE
###################

zcrit <- qnorm(0.05, lower.tail=FALSE)
zcrit
[1] 1.644854

which it clearly is. We would also reject if the \(p\)-value is less that \(0.05\)

Code
#############
# P-VALUE
############

pval <- pnorm(1.82, lower.tail=FALSE)
pval
[1] 0.0343795

which, again, it clearly is.

Conclusion: We reject \(H_{0}\) and conclude that there is significant evidence that there is a difference in effectiveness between the two drugs. That is, The new drug performs better than asprin.

4. Kruskal-Wallis Test

ImportantKey Idea

A Kruskal-Wallis test is used when we want to compare two or more independent groups/samples of ordinal data or quantitative data with respect to their medians.

It is the equivalent of a single factor ANOVA.

4.1. Data and Assumptions

  1. The data is either ordinal or quantitative, but not necessarily normal
  2. The treatment levels and observations within each treatment level are independent
  3. There are, at least, three observations per group/sample
  4. The distributions of the groups differ with respect to their location (median) only, if they differ at all

4.2. Hypothesis Testing for the Kruskal-Wallis Test

4.2.1. Hypotheses

We have the following:

\[ H_{0}: \text{the locations of the $k$ populations (groups) are the same} \]

\[ H_{1}: \text{at least two populations differ} \]

4.2.2. Calculating the Test Statistic

  1. We combine the observations from all the \(k\) groups to form one sample. This sample will have \(n_{T}=\sum_{i=1}^{k}n_{j}\) observations.
  2. Then, we rank the observations, averaging ranks for all tied observations
  3. We calculate the sum of ranks, \(T_{1}, T_{2},\dots,T_{k}\), for all the \(k\) groups
Note

As a consequence of this, we have that

\[ \sum_{i=1}^{k}T_{i}=\frac{n_{T}(n_{T}+1)}{2} \]

The test statistic is then given by

\[ H=\left[\frac{12}{n_{T}(n_{T}+1)}\sum_{i=1}^{k}\left(\frac{T^{2}_{i}}{n_{i}}\right)\right]-3(n_{T}+1) \]

Note

If all the populations have the same location, i.e. \(H_{0}\) is true, then the ranks should be evenly distributed among the \(k\) samples and the \(H\) statistic will be small.

Here, “small” means “sufficiently close to zero”

4.2.3. Critical Region

When the sample sizes of the \(k\) groups is at least three, the sampling distribution of \(H\) is a chi-squared distribution with \(k-1\) degrees of freedom. Thus, the test is one-sided, and we reject \(H_{0}\) if \(H\) is too large (\(H \geq c\)) for some critical value \(c\), or if \(p \leq \alpha\) for some defined significance level \(\alpha\).

Note

If you are wondering how we calculate the critical region for when \(n_{i}<3\), we don’t. The Kruskal-Wallis test is particularly defined for \(n_{i} \geq 3\) for the \(k\) groups. It so happens that the test statistic follows a chi-squared distribution for this.

WarningWorked Example

A 24hr restaurant wanted to determine how customers rate three shifts with respect to speed of service. Three samples of \(10\) customer response-cards were randomly selected, one sample from each shift, and customer ratings (from \(1\) for “very slow” to \(5\) for “very quick”) were recorded. The ranked data was recorded in the following table

4:00 - midd midd - 8:00 8:00 - 4:00
4 (27) 3 (16.5) 3 (16.5)
4 (27) 4 (27) 1 (2)
3 (16.5) 2 (6.5) 3 (16.5)
4 (27) 2 (6.5) 2 (6.5)
3 (16.5) 3 (16.5) 1 (2)
3 (16.5) 4 (27) 3 (16.5)
3 (16.5) 3 (16.5) 4 (27)
3 (16.5) 3 (16.5) 2 (6.5)
2 (6.5) 2 (6.5) 4 (27)
3 (16.5) 3 (16.5) 1 (2)

Can we conclude that customers perceive the speed of service to be different among the three shifts at a 5 percent significance level?

We have our hypotheses:

\[ H_{0}: \text{there is no difference in perception of the speed of service} \]

\[ H_{1}: \text{there is a difference in the perception of the speed of service} \]
From the table, we find that

\[ T_{1}=186.5 \quad T_{2}=156 \quad T_{3}=122.5 \]

and we can calculate the test statistic as

\[ H=\frac{12}{30(30+1)}\left(\frac{(186.5)^{2}}{10}+\frac{(156)^{2}}{10}+\frac{(122.5)^{2}}{10}\right)-3(30+1)=2.645 \]

we can calculate the critical region

Code
##########
# CRIT
#########

k <- 3
chi_crit <- qchisq(0.05, df=k-1, lower.tail=F)
chi_crit
[1] 5.991465

and the \(p\)-value

Code
##############
# p-value
##############

p <- pchisq(2.645, k-1, lower.tail=F)
p
[1] 0.2664683

Conclusion: In this case, we fail to reject the null hypothesis since our test statistic is not more extreme than the critical value, and \(p>0.05\). We then conclude that there is no evidence of a difference in the perception of speed of service between the different shifts.

5. Friedman Test

ImportantKey Idea

A Friedman test is used when comapring more than two groups or samples of ordinal or quantitative data, using matched or blocked samples, with respect to their (median) locations.

A Friedman test is the equivalent of an randomised block design two-way ANOVA without interactions

5.1. Data and Assumptions

  1. Data is either ordinal or quantitative, but not normal
  2. The data comes from a blocked experiment with b blocks
  3. The measurements within a block are dependent
  4. The measurements between blocks are independent
  5. No interaction between blocks and treatments

5.2. Hypothesis Testing for the Friedman Test

Before going deep into how we perform a hypothesis test for the Friedman test, it is worth looking at the structure of the experiments for which the test is used to investigate.

Recall that blocking is introduced into an experiment to improve comaprison of the treatments by grouping the experimental units into blocks based on them being the same with regards to some characteristic. These blocks will have the same number of experimental units, each having the treatment occurring once. So,

\[ \text{number of units in each block}=\text{number of treatments} \]

Here is an example of this:

Treatment Block 1 Block 2 Block 3 Block 4
1 \(y_{11}\) \(y_{12}\) \(y_{13}\) \(y_{14}\)
2 \(y_{21}\) \(y_{22}\) \(y_{23}\) \(y_{24}\)
3 \(y_{31}\) \(y_{32}\) \(y_{33}\) \(y_{34}\)
4 \(y_{41}\) \(y_{42}\) \(y_{43}\) \(y_{44}\)
5 \(y_{51}\) \(y_{52}\) \(y_{53}\) \(y_{54}\)

So, we will end up measuring whether the \(k\) treatment groups differ in their median.

5.2.1. Hypotheses

We have the following:

\[ H_{0}: \text{the locations of the $k$ populations are the same} \]

\[ \text{and} \]

\[ H_{1}: \text{at least two population locations differ} \]

Tip

Remember to interpret your hypotheses based on the context of the question which you are trying to answer

5.2.2. Calculating the Test Statistic

  1. Rank the observations from smallest to largest within each block
  2. Average ranks of tied observations within the same block
  3. Calculate the rank sums \(T_{1}, T_{2}, \dots, T_{k}\) for all the \(k\) treatments

The test statistic is then given by

\[ F_{r}=\left[\frac{12}{b(k)(k+1)}\sum_{j=1}^{k}T_{j}^{2}\right]-3b(k+1) \]

where

  • \(b\) is the number of blocks

  • \(k\) is the number of treatments; and

\(F_{r}\) is the actual test statistic which has a chi-squared distribution (approximately) provided that \(k \geq5\) or \(b \geq 5\) with \(k-1\) degrees of freedom

We then reject the null hypothesis if \(F_{r}\) is too large under the assumption of the null hypothesis

WarningWorked Example

Four managers evaluate applicants for a job in an accounting firm on several dimensions including academic credentials, previous work experience and personal suitability. Each manager then summarises the results and produces an evaluation of the candidates. There are \(5\) possibilities:

  1. The candidate is in the top \(5\%\) of applicants
  2. The candidate is in the top \(10\%\) of applicants, but not the the top \(5\%\)
  3. The candidate is in the top \(25\%\) of applicants, but not in the top \(10\%\)
  4. The candidate is in the top \(50\%\) of applicants, but not in the top \(25\%\)
  5. The candidate is in the bottom \(50\%\) of applicants

Eight applicants were chosen at randomly selected, and their evaluations by the four managers were recorded.

Applicant Manager 1 Manager 2 Manager 3 Manager 4
1 2 1 2 2
2 4 2 3 2
3 2 2 2 3
4 3 1 3 2
5 3 2 3 5
6 2 2 3 3
7 4 1 5 5
8 3 2 5 3

Can we say that there are differences in the way the managers evaluate candidates?

Here, we are trying to determine how getting scored by a particular manager affects where the applicants are placed in the candidacy groups. So, the treatments are the managers. The blocking factor are the applicants themselves since the treatments are applied to all the applicants.

To find the treatments, always ask yourself, “What effect are we trying to measure?” Since we are trying to measure the effect that each manager has on the scoring, that is our treatment – the managers.

Usually, then, the blocks will follow from this. However, you can ask yourself “What is being measured repeatedly for each treatment?”

Notice, also, that the observations within each block (the applicants) are dependent since they are measured on the same applicant. This makes sense since a stronger applicant is very likely to score higher across all groups.

For, the hypotheses, we have

\[ H_{0}: \text{there is no difference in the way that managers evaluate candidates} \]

\[ H_{1}: \text{there is a difference in the way that managers evaluate candidates} \]

To calculate the test statistic, we first rank within the blocks to obtain the sum of ranks. We have the following:

Applicant Manager 1 Manager 2 Manager 3 Manager 4
1 2 (3) 1 (1) 2 (3) 2 (3)
2 4 (4) 2 (1.5) 3 (3) 2 (1.5)
3 2 (2) 2 (2) 2 (2) 3 (4)
4 3 (3.5) 1 (1) 3 (3.5) 2 (2)
5 3 (2.5) 2 (1) 3 (2.5) 5 (4)
6 2 (1.5) 2 (1.5) 3 (3) 4 (4)
7 4 (2) 1 (1) 5 (3.5) 5 (3.5)
8 3 (2.5) 2 (1) 5 (4) 3 (2.5)

and we get the sum of ranks as \(T_{1}=21\), \(T_{2}=10\), \(T_{3}=24.5\), and \(T_{4}=24.5\). We can then calculate the test statistic. We obtain that

\[ F_{r}=\left[\frac{12}{(8)(4)(4+1)}\left((21)^{2}+(10)^{2}+(24.5)^{2}+(24.5)^{2}\right)\right]-3(8)(4+1)=10.61 \]

We can find the critical value (and therefore the critical region)

Code
#################### 
# CRITICAL VALUE  
####################

k <- 4
crit <- qchisq(0.05, df=k-1, lower.tail=F)
crit
[1] 7.814728

and the \(p\)-value associated with the test statistic.

Code
############ 
# P VALUE  
############

p <- pchisq(10.61, df=k-1, lower.tail=F)
p
[1] 0.01403297

Conclusion: Based on the test statistic being more extreme than the critical value, and having a \(p\)-value less than \(0.05\), we reject the null hypothesis and conclude that there is evidence of a difference in the way that the different managers evaluate the candidates.

6. Spearman Rank Correlation Coefficient Test

ImportantKey Idea

The Spearman Rank Correlation Coefficient Test is used to measure the association between two samples/variables of ordinal or quantitative data

This test is equivalent to the Pearson’s Correlation Coefficient Test

6.1 Data and Assumptions

  1. Both variables are, at least, ordinal (though, they may be quantitative), and at least one variable is not normal
  2. There are a total of \(n\) randomly selected paired observations
Note

Sprearman’s rank correlation coefficient is interpreted the same way as Pearson’s correlation. That is,

\[ -1 \leq r_{s} \leq 1 \]

and

  • \(-1 \implies\) perfect negative relationship

  • \(-0.5 \implies\) moderate negative relationship

  • \(0 \implies\) no relationship

  • \(0.5 \implies\) moderate positive relationship

  • \(+1 \implies\) perfect positive relationship

6.2. Hypothesis Testing for Spearman Rank Correlation Test

6.2.1. Hypotheses

The null hypothesis is given by

\[ H_{0}: \rho_{s}=0 \text{ (no association between the two variables in the underlying population)} \]

and the alternative hypotheses can either be one-sided or two-sided. For a two-sided alternative hypothesis, we have

\[ H_{1}: \rho_{s} \neq 0 \text{ (there is an association between the two uvariables in the underlying population)} \]

and, for the one-sided alternative hypotheses, we have

\[ H_{1}: \rho_{s}>0 \text{ (positive correlation)} \]

\[ \text{and} \]

\[ H_{1}: \rho_{s}<0 \text{ (negative correlation)} \]

6.2.2. Calculating the Test Statistic

To calculate the test statistic, we

  1. Rank rhe populations separately

  2. Calculate the difference, \(d\), within each pair of ranks. So,

    \[ d_{i}=\text{rank}(x_{i})-\text{rank}(y_{i}) \]

  3. The test statistic is then given by

    \[ r_{s}=1-\frac{6\sum_{i=1}^{n}d^{2}_{i}}{n(n^{2}-1)} \]

where \(n\) is the number of pairs of data

For large samples (\(n \geq 10\)), the sampling distribution of the test statistic, \(r_{s}\) is approximately normal, and the test \(z\)-score is given by

\[ z=\frac{r_{s}-\mu_{r_{s}}}{\sigma_{r_{s}}} \]

where \(\mu_{r_{s}} = 0\) under the assumption that \(H_{0}\) is true and \(\sigma_{r_{s}} = \sqrt{\frac{1}{n-1}}=\frac{1}{\sqrt{n-1}}\). From, this, we can simplify the \(z\) calculation by observing that

\[ z=r_{s}\sqrt{n-1} \]

under the assumption that \(H_{0}\) is true.

6.2.3. Conclusion

We the reject the null hypothesis if

  • \(|z| \geq z_{\alpha/2}\) for a two-sided test

  • \(z>z_{\alpha}\) for a right-tailed test; and

  • \(z<-z_{\alpha}\) for a left-tailed test; OR

  • if the \(p\)-value is less than the defined \(\alpha\)

WarningWorked Example

After several semesters without much success, Pat Statstud (a student in the lowest quarter of a statistics course) decided to try and improve his performance. Pat needed to know the secret of success for university students.

After many hours of discussion with other more successful students, Pat postulated a rather radical theory: the longer one studied, the better one’s grade.

To test the theory, Pat took a random sample of 35 students in an economics course and asked each to report the average amount of time he or she studied economics, and the final mark (out of 100) obtained (see results on next slide).

Test to determine whether grade and study time are positively related.

The ranked data is as follows.

Code
###############################
# STUDY TIME VS MARK DATA
###############################


library(dplyr)
library(gt)

# Left block
left <- tibble(
  Time = c(30, 5, 36, 37, 32, 23, 34, 2, 34, 43, 34, 32, 30, 36, 40, 24, 0, 25),
  Rank_Time = c(17, 4, 30.5, 32, 22.5, 7, 28, 2.5, 28, 35, 28, 22.5, 17, 30.5, 34, 8.5, 1, 10.5),
  Mark = c(71, 30, 82, 98, 78, 73, 82, 25, 94, 99, 85, 74, 79, 82, 88, 55, 7, 62),
  Rank_Mark = c(9, 4, 17.5, 34, 14, 10.5, 17.5, 3, 32, 35, 22, 12, 15, 17.5, 26, 5, 1, 6)
)

# Right block
right <- tibble(
  Time = c(29, 21, 31, 30, 33, 30, 33, 22, 29, 24, 30, 2, 31, 33, 25, 38, 26),
  Rank_Time = c(13.5, 5, 20.5, 17, 25, 17, 25, 6, 13.5, 8.5, 17, 2.5, 20.5, 25, 10.5, 33, 12),
  Mark = c(91, 66, 66, 73, 90, 88, 91, 64, 83, 87, 96, 16, 84, 92, 82, 88, 75),
  Rank_Mark = c(29.5, 8, 23, 10.5, 28, 26, 29.5, 7, 20, 24, 33, 2, 21, 31, 17.5, 26, 13)
)

# Combine and format
data <- bind_rows(left, right)

data %>%
  gt() %>%
  tab_header(
    title = "Study Time vs Marks Dataset"
  ) %>%
  fmt_number(
    columns = everything(),
    decimals = 1
  ) %>%
  tab_options(
    table.font.size = "small"
  )
Study Time vs Marks Dataset
Time Rank_Time Mark Rank_Mark
30.0 17.0 71.0 9.0
5.0 4.0 30.0 4.0
36.0 30.5 82.0 17.5
37.0 32.0 98.0 34.0
32.0 22.5 78.0 14.0
23.0 7.0 73.0 10.5
34.0 28.0 82.0 17.5
2.0 2.5 25.0 3.0
34.0 28.0 94.0 32.0
43.0 35.0 99.0 35.0
34.0 28.0 85.0 22.0
32.0 22.5 74.0 12.0
30.0 17.0 79.0 15.0
36.0 30.5 82.0 17.5
40.0 34.0 88.0 26.0
24.0 8.5 55.0 5.0
0.0 1.0 7.0 1.0
25.0 10.5 62.0 6.0
29.0 13.5 91.0 29.5
21.0 5.0 66.0 8.0
31.0 20.5 66.0 23.0
30.0 17.0 73.0 10.5
33.0 25.0 90.0 28.0
30.0 17.0 88.0 26.0
33.0 25.0 91.0 29.5
22.0 6.0 64.0 7.0
29.0 13.5 83.0 20.0
24.0 8.5 87.0 24.0
30.0 17.0 96.0 33.0
2.0 2.5 16.0 2.0
31.0 20.5 84.0 21.0
33.0 25.0 92.0 31.0
25.0 10.5 82.0 17.5
38.0 33.0 88.0 26.0
26.0 12.0 75.0 13.0

We start with the null and alternative hypotheses. The null hypothesis is given by

\[ H_{0}: \text{more time spent studying doesn't improve one's grade } (\rho_{s}=0) \]

and the alternative hypothesis is

\[ H_{1}: \text{more time spent studying improvesone's grade} (\rho_{s}>0) \]

We will test at the \(5\%\) significance level. To calculate the test statistic, we will need the differences. These are given in the table below.

Code
###############################
# STUDY TIME VS MARK DATA
###############################


library(dplyr)
library(gt)

# Left block
left <- tibble(
  Time = c(30, 5, 36, 37, 32, 23, 34, 2, 34, 43, 34, 32, 30, 36, 40, 24, 0, 25),
  Rank_Time = c(17, 4, 30.5, 32, 22.5, 7, 28, 2.5, 28, 35, 28, 22.5, 17, 30.5, 34, 8.5, 1, 10.5),
  Mark = c(71, 30, 82, 98, 78, 73, 82, 25, 94, 99, 85, 74, 79, 82, 88, 55, 7, 62),
  Rank_Mark = c(9, 4, 17.5, 34, 14, 10.5, 17.5, 3, 32, 35, 22, 12, 15, 17.5, 26, 5, 1, 6)
)

# Right block
right <- tibble(
  Time = c(29, 21, 31, 30, 33, 30, 33, 22, 29, 24, 30, 2, 31, 33, 25, 38, 26),
  Rank_Time = c(13.5, 5, 20.5, 17, 25, 17, 25, 6, 13.5, 8.5, 17, 2.5, 20.5, 25, 10.5, 33, 12),
  Mark = c(91, 66, 66, 73, 90, 88, 91, 64, 83, 87, 96, 16, 84, 92, 82, 88, 75),
  Rank_Mark = c(29.5, 8, 23, 10.5, 28, 26, 29.5, 7, 20, 24, 33, 2, 21, 31, 17.5, 26, 13)
)

# Combine and format
data <- bind_rows(left, right)

data %>%
  mutate(
    d_i = Rank_Time - Rank_Mark
  ) %>%
  gt() %>%
  tab_header(
    title = "Study Time vs Marks Dataset (with Differences)"
  ) %>%
  fmt_number(
    columns = everything(),
    decimals = 1
  ) %>%
  tab_options(
    table.font.size = "small"
  )
Study Time vs Marks Dataset (with Differences)
Time Rank_Time Mark Rank_Mark d_i
30.0 17.0 71.0 9.0 8.0
5.0 4.0 30.0 4.0 0.0
36.0 30.5 82.0 17.5 13.0
37.0 32.0 98.0 34.0 −2.0
32.0 22.5 78.0 14.0 8.5
23.0 7.0 73.0 10.5 −3.5
34.0 28.0 82.0 17.5 10.5
2.0 2.5 25.0 3.0 −0.5
34.0 28.0 94.0 32.0 −4.0
43.0 35.0 99.0 35.0 0.0
34.0 28.0 85.0 22.0 6.0
32.0 22.5 74.0 12.0 10.5
30.0 17.0 79.0 15.0 2.0
36.0 30.5 82.0 17.5 13.0
40.0 34.0 88.0 26.0 8.0
24.0 8.5 55.0 5.0 3.5
0.0 1.0 7.0 1.0 0.0
25.0 10.5 62.0 6.0 4.5
29.0 13.5 91.0 29.5 −16.0
21.0 5.0 66.0 8.0 −3.0
31.0 20.5 66.0 23.0 −2.5
30.0 17.0 73.0 10.5 6.5
33.0 25.0 90.0 28.0 −3.0
30.0 17.0 88.0 26.0 −9.0
33.0 25.0 91.0 29.5 −4.5
22.0 6.0 64.0 7.0 −1.0
29.0 13.5 83.0 20.0 −6.5
24.0 8.5 87.0 24.0 −15.5
30.0 17.0 96.0 33.0 −16.0
2.0 2.5 16.0 2.0 0.5
31.0 20.5 84.0 21.0 −0.5
33.0 25.0 92.0 31.0 −6.0
25.0 10.5 82.0 17.5 −7.0
38.0 33.0 88.0 26.0 7.0
26.0 12.0 75.0 13.0 −1.0

Now, we are ready to calculate the test statistic.

\[ r_{s}=1-6\left[\frac{(8)^{2}+(0)^{2}+(13)^{2}+\dots+(7)^{2}+(-1)^{2}}{35((35)^{2}-1)}\right]\approx0.7251 \]

and the associated \(z\)-score will be

\[ z=0.7251\sqrt{35-1}=4.228 \]

The critical region (and note, this is a one-sided test) is given by

Code
####################
# CRITICAL POINT
####################

zcrit <- qnorm(0.05, lower.tail=F)
zcrit
[1] 1.644854

\(z \geq 1.645\), and the \(p\)-value for the test statistic is given by

Code
##################
# P-VALUE
#################

pv <- pnorm(4.228, lower.tail=F)
pv
[1] 1.178889e-05

Conclusion: We reject the null hypothesis since the test statistic falls into the rejection region, and the \(p\)-value is less than the significance level defined (\(\alpha=0.05\)). We then conclude that there is significant evidence of a positive relationship between the amount of time spent studying and the grade of a student.

7. Advantages and Disadvantages of Non-Parametric Statistical Techniques

ImportantAdvantages of Non-Parameteric Tests
  • Can be used when parametric techniques are not suited for the data samples given, and the validity of their assumptions is uncertain

  • Useful for small sample sizes

  • The assumptions are usually few and easily met

  • They are not just restricted to quantitative data

ImportantDisadvantages of Non-Parametric Tests
  • Information is lost by ranking or taking signed ranks. As a result, we lose more power (the probability of rejecting the null hypothesis when it is, in fact, false) compared to the equivalent parametric tests (when one is appropriate for the data)