Nonparametric Tests [Part 2]

One-sample and paired two-sample inference (cont..)

In the previous article (https://rpubs.com/aaam2022/1068462), we discussed the sign test.

Despite its simplicity and fewer assumptions, the sign test has lower power compared to either the parametric t-est or the nonparametric Wilcoxon signed-rank (WSR) test. The sign test depends merely on the signs of the differences and ignores the magnitude of differences. This has been addressed in WSR test, which is based on the signed ranks of the differences as will be illustrated.

2. Wilcoxon signed-rank test

1.1 Applications

It is used for hypothesis testing:

About a population median using a sample drawn from that population.
To determine whether the median of differences of two populations is zero or another prespecified constant, using two paired samples.

1.2 Assumptions

The observations are a random sample of the population.
The distribution of the population (either the of the one sample or the differences of the two paired samples) is approximately symmetrical about its median (symmetrical but not necessarily normal).
The variable is measured on interval or ratio scale (because the magnitude of differences is being considered during the computation of test statistic).

|Note 1:

The classical WSR test assumed the variable distribution is continuous because it does not allow for ties in the ranks. However, this assumption is relaxed by allowing tied ranks and using the mean ranks as will be discussed below.

1.3 Rationale

Consider two paired random variables $X$ and $Y$ from which we have drawn two random samples. If we are interested in testing the hypothesis that the median $(M)$ of differences $D_i=Y_i-X_i$ is zero (i.e., the outcome on one condition is not higher or lower than the other), the null and alternative hypotheses can be formulated as follows:

\[H_0: \text {the median of differences} =0,\ H_1: \text {the median of differences} \ne0\]

Calculate the differences of the paired observations $D_i=Y_i-X_i$ (alternatively $D_i=X_i-M_0$ in the case of one sample test).
If any difference $D_i=0$, exclude it and reduce the sample size accordingly.
Rank the absolute values of the remaining differences $|D_i|$ from the smallest to the largest (i.e., rank without regard to the sign).
If two or more of the differences are equal, each tied value is assigned the mean of the ranks of the tied values. For, example, if three equal absolute differences having ranks $3$, $4$, and $5$, then each is assigned the mean rank, $(3+4+5)/3=4$.
Assign the sign of the difference $(D_i)$ to each corresponding rank in the dataset.
Calculate the sum of the ranks with positive signs $(W^+)$ and the sum of the ranks with negative signs $(W^-)$.
If $H_0$ is true, the rank sums for positive and negative differences are expected to be about the same:
- |Note 2: Total rank sum of $n$ observations $W=W^++W^–=n(n+1)/2$
The test statistic is either the sum of the ranks of the positive differences $(W^+)$ or the smaller of the sum of ranks of positive differences and the sum of ranks of negative differences $(T)$:
- $R$ and $SPSS$ report the sum of the ranks of the positive differences. $R$ reports it under the name of $V$ instead of $W$.
To calculate the $p-$values, the exact probability distribution of $W$ is used, which is based on permutation of all possible ranks.
By default, the built in $R$ function $\text{wilcoxon.test()}$ computes an exact $p-$value if the samples contain $\lt50$ finite values and there are no ties. If there are ties, the built in $R$ spits out a warning message “cannot compute exact p-value with ties”. In this case, $R$ calculates $p-$value based on normal approximation (which is also the case for large samples):
- If the $H_0$ is true, the expected value of the rank sum of positive differences $(W^+)$ is half the total rank sum (see above Note 2):
  
  \[ E_0(W^+)=\frac{n(n+1)}{4} \]
- The variance is calculated using the following equations:
  
  \[ Var_0(W^+) = \frac{n(n+1)(2n+1)}{24}\ \text {if there are no ties in the ranks} \]
  
  \[ Var_0(W^+) = \frac{n(n+1)(2n+1)-0.5 \sum ^C _{i=1}\ T_i(T_i-1)(T_i+1)}{24}\ \text{if there are ties in the ranks,}\\ \text{where}\ C\ \text{is the number of groups with ties and}\ T_i\ \text{is the number of observations within the tie group}\ i \]
- Then, $\large \frac{W^+-E_0(W^+)}{\sqrt{Var_0(W^+)}}$ follows the standard normal distribution $N(0,1)$. It is also possible to apply continuity correction in $R$, where the equation becomes $Z=\large \frac{(W^+\pm0.5)-E_0(W^+)}{\sqrt{Var_0(W^+)}}$, use $(W^++0.5)$ when $W^+\lt E_0(W^+)$ and $(W^+-0.5)$ when $W^+\gt E_0(W^+)$.

1.4 Paired Samples Example (with ties in ranks and zero differences)

The following scores were measured in a group of $9$ patients before and after applying a new therapy $(2.5, 3.1, 2.9, 3.3, 3.5, 1.4, 4, 2, 5)$ and $(1.8, 2.1, 2.9, 2.3, 1.5, 3, 1.9, 3.2, 3.9)$, respectively. Can we conclude that the median of differences between the scores before and after treatment is different from $0$.

\[H_0: \text {the median of score differences} =0,\ H_1: \text {the median of score differences} \ne0\]

The differences between the paired observations in rows $2$ and $4$ are equal, so their ranks are tied. They occupy ranks $2$ and $3$, so each value has been assigned the mean rank of $2.5$. The difference in row $3$ is zero, so it has been excluded. $W^+=25\ \text {and}\ W^–=-11$. Fig. 1 shows that the distribution of the score differences is nearly symmetrical.

$\label{fig:figs}Fig. 1 Histogram of score difference$

Fig. 1 Histogram of score difference

The median score before treatment is 3.1, while it is 2.3 after treatment.

Summary Statistics
Variable	N	Mean	Std. Dev.	Min	Pctl. 25	Pctl. 50	Pctl. 75	Max
before	9	3.1	1.1	1.4	2.5	3.1	3.5	5
after	9	2.5	0.78	1.5	1.9	2.3	3	3.9

Now, let’s carry out WSR test using $R$ built in function.

wilcox.test(scores$before, scores$after, paired = TRUE, exact = TRUE)

## Warning in wilcox.test.default(scores$before, scores$after, paired = TRUE, :
## cannot compute exact p-value with ties

## Warning in wilcox.test.default(scores$before, scores$after, paired = TRUE, :
## cannot compute exact p-value with zeroes

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  scores$before and scores$after
## V = 25, p-value = 0.3621
## alternative hypothesis: true location shift is not equal to 0

The output shows two warnings because of the zero difference and tied ranks. Therefore, the algorithm defaults to $Z$ approximation with continuity correction despite setting the argument $exact\ = TRUE$. These warnings can be suppressed by specifying $exact\ = FALSE$.
To get $p-$value without continuity corrections, specify the argument $correct=FALSE$.
To get $CI$, specify the argument $conf.int=TRUE$.
With regard to $CI$, the following is quoted from the function help page:
- “If exact p-values are available, an exact confidence interval is obtained by the algorithm described in Bauer (1972), and the Hodges-Lehmann estimator is employed. Otherwise, the returned confidence interval and point estimate are based on normal approximations. These are continuity-corrected for the interval but not the estimate (as the correction depends on the alternative)”.
|Note 3: Pseudo(median) is not the median of the differences, simply:
- With exact calculations, it is computed using Hodges-Lehmann estimator that uses the median of Walsh averages (averages of all possible pairs of the observations). If the distribution of the data is assumed to be symmetric, then Hodges-Lehmann estimator is also the median of differences.
- With normal approximation, it is computed by finding the value within the sample space that upon subtraction from the observation yields absolute sum rank equal to the expected value of Wilcoxon $W$ statistic.

Let’s try the test again after specifying these arguments:

wilcox.test(scores$before, scores$after, paired = TRUE, exact = FALSE, correct = FALSE, conf.int = TRUE)

## 
##  Wilcoxon signed rank test
## 
## data:  scores$before and scores$after
## V = 25, p-value = 0.3264
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  -0.3000493  1.5500297
## sample estimates:
## (pseudo)median 
##      0.8947748

In order to get exact $p-$value we can opt to $wilcox.exact()$ function from $exactRankTests$ package or the $wilcoxsign\text{_}test()$ function from the $coin$ package. The latter has more options but requires the data to be in long format.

library(exactRankTests)
wilcox.exact(scores$before, scores$after, paired = TRUE, conf.int = TRUE)

## 
##  Exact Wilcoxon signed rank test
## 
## data:  scores$before and scores$after
## V = 25, p-value = 0.3672
## alternative hypothesis: true mu is not equal to 0
## 95 percent confidence interval:
##  -0.45  1.60
## sample estimates:
## (pseudo)median 
##          0.875

The $p-$value of the exact test is larger than that computed by the normal approximation but leads to the same interpretation of not rejecting $H_0$. The pseudo(median) is also very close to th that based on the normal approximation.

To use $wilcoxsign\text{_}test()$ function, the dataset has been reshaped to long format as follows:

library(coin)
wilcoxsign_test(Score~Group|ID, data = scores_long, zero.method="Wilcoxon", distribution="exact")

## 
##  Exact Wilcoxon Signed-Rank Test
## 
## data:  y by x (pos, neg) 
##   stratified by block
## Z = 0.9814, p-value = 0.3672
## alternative hypothesis: true mu is not equal to 0

# distribution can also be specified to asymptomatic 
# ID is the blocking factor (i.e., identify which observations belong to the same subject)

1.5 Pratt method for dealing with zero differences

In this method, all observations are ranked including the zeros in ascending order based on the absolute magnitude.
The ranks of the zeros are then dropped without changing the ranks of the non-zero values. Then proceed with the test as described.
Tables of critical values for a conditional exact Pratt test are available. Similarly, the exact distribution of the test statistic can be calculated, independent of tabulated values, via running through all $2^n$ sign permutations.

wilcoxsign_test(Score~Group|ID, data = scores_long, zero.method="Pratt", distribution="exact")

## 
##  Exact Wilcoxon-Pratt Signed-Rank Test
## 
## data:  y by x (pos, neg) 
##   stratified by block
## Z = 1.069, p-value = 0.3203
## alternative hypothesis: true mu is not equal to 0

# specify zero.method to Pratt

Another function that implements Pratt’s method is $wsrTest()$ from $asht$ package.

library(asht)
wsrTest(scores$before, scores$after, conf.int = TRUE)

## 
##  Exact Wilcoxon Signed-Rank Test (with Pratt modification if zeros)
## 
## data:  scores$before minus scores$after
## p-value = 0.3203
## alternative hypothesis: true median is not equal to 0
## 95 percent confidence interval:
##  -0.45  1.55
## sample estimates:
## median (Hodges-Lehmann estimator) 
##                         0.7000229

Interpretation:

The data does not provide sufficient evidence to reject $H_0$ hypothesis. Therefore, the median of score differences before and after treatment is not different from zero $(p=0.3203)$ using Exact Wilcoxon-Pratt signed-rank test. The Hodges-Lehmann estimator of the median of differences is $0.7 (95\% CI=-0.45 \ \text{to}\ 1.55)$. This estimate is similar to what is computed by $\text{SPSS}$ (Fig. 2).

Fig. 2 SPSS output

Pratt’s method is also implemented in $GraphPad Prism$ (Fig. 3).

Fig. 3 Graphpad Prism output

The boxplots of scores before and after treatment are depicted in Fig. 4.

$\label{fig:figs}Fig. 4 Effect of treatment on scores$

Fig. 4 Effect of treatment on scores

1.7 Effect size

Matched-pairs rank-biserial $r$:

$r= \large \frac{W^+-W^–}{W}$
In the previous example, $r = (25-11)/36=0.389$
This can be done in $R$ using $effectsize$ package

library(effectsize)
es <- rank_biserial(scores$before, scores$after, paired = TRUE)
es

## r (rank biserial) |        95% CI
## ---------------------------------
## 0.39              | [-0.31, 0.82]

interpret_rank_biserial(es$r_rank_biserial, rules = "cohen")

## [1] "moderate"
## (Rules: cohen1988)

# other rules are also available
# refer to the function help for a full list of r values interpretation

This effectsize is also implemented in $Jamovi$ software (Fig. 5).

Fig. 5 Jamovi output

Normal approximation:

$ES=\large \frac{|Z|}{\sqrt{n}}$, $Z$ is extracted from asymptotic Wilcoxon-Pratt signed-rank test and $n$ is the number of pairs of observations.
This can be implemented in $R$ using $rstatix$ package.

library(rstatix)
wilcox_effsize(scores_long, Score~Group, paired = TRUE)

## # A tibble: 1 × 7
##   .y.   group1 group2 effsize    n1    n2 magnitude
## * <chr> <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 Score before after    0.356     9     9 moderate

# uses long format of the data

1.8 Sample size calculation using G*Power

This section is quoted from the G*Power manual:
- “G*Power implements two different methods to estimate the power for the signed-rank Wilcoxon test:
  1. The asymptotic relative efficiency (ARE) method that defines power relative to the one sample t-test:
    - If a sample size $N$ is required to achieve a specified power for the Wilcoxon signed-rank test and a sample size $N'$ is required in the t-test to achieve the same power, then the ratio $N/N'$ is called the efficiency of the Wilcoxon signed-rank test relative to the one- sample t test.
    - The limiting efficiency as sample size $N$ tends to infinity is called ARE of the Wilcoxon signed rank test re-ative to the t-test.
    - To estimate the power of the Wilcoxon test, scale the sample size with the corresponding ARE value and then performs the procedure for the t-test.
  2. Normal approximation to the power proposed by Lehmann.
What you need to know is the effect size $(ES)$:
- G*Power suggests to use the conventional values proposed by Cohen for the t-test:
  - small $d = 0.2$
  - medium $d = 0.5$
  - large $d = 0.8$
- Alternatively, you can use the means and standard deviations from a pilot study to calculate it in G*Power.
In the following video, we will use G*Power to calculate the minimum required sample size using medium effect size, a power of $80\%$, and $\alpha=0.05$. We will use ARE method specifying the parent distribution as min. ARE, which is the most conservative.

References

Bland, M. (2015). An Introduction to Medical Statistics. United Kingdom: OUP Oxford.
Corder, G. W., Foreman, D. I. (2014). Nonparametric Statistics: A Step-by-Step Approach. Germany: Wiley.
Daniel, W. W., Cross, C. L. (2013). Biostatistics: A Foundation for Analysis in the Health Sciences. Singapore: Wiley.
G*Power 3.1 manual. https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerManual.pdf
Rey, D., Neuhäuser, M. (2011). Wilcoxon-Signed-Rank Test. In: Lovric, M. (eds) International Encyclopedia of Statistical Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_616.