Quiz 7A (Non-Parametric Tests – WSRS & MWU)

Scenario 1

Students were asked to estimate the population of Canada (in millions), which at the time was approximately 30 million. Before making their estimates, one group of ten students (Sample 1) was told that the population of the United States was about 290 million. A second group of nine students (Sample 2) was told separately that the population of Australia was about 20 million. The purpose of the study is to determine whether the reference information influenced the students’ estimates of the Canadian population.

At the 5% significance level, test whether there is a difference in the Canadian population estimates between the two groups. Start your analysis by inspecting the data (use box plots to visualise the two samples).

Sample 1 (USA) 2 30 35 70 100 120 135 150 190 200
Sample 2 (AUS) 8 12 16 29 35 40 45 46 95

MCQ 1

Question 1 (2 points)

Which non-parametric test is appropriate to use for this data?

  1. Wilcoxon signed-rank test since we have categorical data and two dependent samples.

  2. Wilcoxon signed-rank test since we have quantitative data and two dependent samples.

  3. Mann-Whitney U test since we have categorical data and two independent samples.

  4. Mann-Whitney U test since we have quantitative data and two independent samples.

In this scenario, we have two independent samples since we are testing the estimation what the estimation will be between two groups who are given different bits of information. Also, the observations within each group are independent. A Man-Whitney U Test seems appropriate here since we do not have matched pairs of data. Notice also that our data is quantitative.

So, we choose option D.

Question 2 (2 points)

What type of data do we have here, and what is the relationship between the samples?

  1. The data collected in this experiment is on the interval scale and the two samples are paired.

  2. The data collected in this experiment is on the ratio scale and the two samples are independent.

  3. The data collected in this experiment is on the interval scale and the two samples are independent.

  4. The data collected in this experiment is on the ratio scale and the two samples are paired.

The data collected here is quantitative. Since the value \(0\) literally means that there is an absence of a population, we can say that this is ratio-scaled data. So, we choose option B.

MCQ2

Question 3 (2 points)

Why might a parametric test (e.g., a t-test) not be appropriate in this context? (Multi-Select)

  1. The sample sizes are quite small, which makes it difficult to assess normality and the t-test assumes normality, especially with small samples.
  2. One of the samples contains an outlier which may skew the results of a t-test.
  3. The data are categorical, and the t-test cannot be used on categorical variables.
  4. The variances appear to differ considerably between groups, which violates the equal variance assumption of a standard t-test.
  5. A parametric test is appropriate since there is no reason to suspect that the data are not normally distributed.

A is valid because the central limit theorem only kicks in for larger values of \(n\), usually \(n>30\). But, at least, \(n>10\). Since we only have \(9\) data values per sample, we may find it difficult to conclude that the data truly follows a normal distribution.

It can be argued that both samples actually contain an outlier. Indeed, these may skew the results of a \(t\)-test. So, E is a correct option.

Also, the variance of sample 2 is clearly much smaller than that of sample 1. This violates the equal variance assumption of the \(t\)-test. We, therefore also choose D.

Question 4 (2 points)

When would a parametric test be preferred over the non-parametric equivalent, and why?

  1. When the assumption of normality is reasonably satisfied, because parametric tests are more powerful when the assumptions of such tests are met.
  2. When means are more meaningful than medians, because parametric tests focus on central tendency.
  3. When the data are ordinal and contain outliers, because parametric tests are more robust to extreme values.
  4. When the sample size is very small and the data are skewed, because parametric tests always produce more accurate results in small samples.

Option C and D are obviously false, since parametric tests are used for quantitative data and are considered for large sample sizes.

Although some argument can be made for B, the “falsehood” comes from claiming that parametric tests focus on central tendency. This is also true for non-parametric tests since medians are a measure of central tendencies.

A is consistent with parametric tests, and so this is the choice we go with.

Numeric Questions

Question 5 (1 point)

Conduct the test manually. What is the final value of the test statistic used to draw a conclusion? 

Sample 1 (USA)

2

(1)

30

(6)

35

(7.5)

70

(12)

100

(14)

120

(15)

135

(16)

150

(17)

190

(18)

200

(19)

Sample 2 (AUS)

8

(2)

12

(3)

16

(4)

29

(5)

35

(7.5)

40

(9)

45

(10)

46

(11)

95

(13)

From this table, we find our ranked sums as \(T_{1}=125.5\) and \(T_{2}=64.5\). As a result, the test statistic is given by

\[ T=\text{min}(T_{1},T_{2})=T_{2}=64.5 \]

Question 6 (1 point)

Conduct the test in R with the in-built function. What is the p-value associated with the test statistic? 

#######################
# MANN-WHITNEY U TEST
#######################

# samples given 
usa <- c(2, 30, 35, 70, 100, 120, 135, 150, 190, 200)
aus <- c(8, 12, 16, 29, 35, 40, 45, 46, 95)

# mann-whitney u test

wilcox.test(usa, aus, exact=FALSE) # two-sided test: order doesn't matter

    Wilcoxon rank sum test with continuity correction

data:  usa and aus
W = 70.5, p-value = 0.04114
alternative hypothesis: true location shift is not equal to 0

From this, we can see that the reported \(p\)-value is \(0.04114\).

Question 7 (1 point)

Conduct the test manually. What is the sum of ranks for the first sample?

\[ T_{1}=125.5 \]

Question 8 (1 point)

Conduct the test manually. What is the sum of ranks for the second sample?

\[ T_{2}=64.5 \]

True or False

Question 9 (1 point)

There is evidence against the null hypothesis at the 5% significance level, and we conclude that the two groups differ in their estimates. Whether students were told the population size of the USA or Australia affected their estimate of Canada’s population.

True

Question 10 (1 point)

There is evidence against the null hypothesis at the 5% significance level, and we conclude that the two groups differ in their estimates. Students who were given the population size of Australia tended to give a lower estimate.

False.

The reason for saying this is because the test that we were doing above is not concerned with whether or not concerned with which sample tends to give lower values than the other; it is only concerned with whether there is a difference between the two samples.

If it is to your interest, it would be true if we did a one-sided test. Observe:

#######################
# MANN-WHITNEY U TEST
#######################

# samples given 
usa <- c(2, 30, 35, 70, 100, 120, 135, 150, 190, 200)
aus <- c(8, 12, 16, 29, 35, 40, 45, 46, 95)

# mann-whitney u test

wilcox.test(usa, aus, exact=FALSE, alternative="greater") 

    Wilcoxon rank sum test with continuity correction

data:  usa and aus
W = 70.5, p-value = 0.02057
alternative hypothesis: true location shift is greater than 0
# one-sided test: order matters
# Notice that we are saying that the alternative suggests that usa values
# are greater than aus values

The \(p\)-value is \(0.02057<0.05\).

Scenario 2

The table shows the golf scores for 12 members of a college women’s golf team across two rounds of a tournament. (Note: In golf, a lower score indicates better performance.) We want to test whether, in a broader population of collegiate women golfers, performance is better after completing a first round.

Before analysis, the following plot was generated:

Based on the information and plot above, choose the appropriate test to perform using a significance level of 1%.

Round 1 89 90 87 95 86 81 102 105 83 88 91 79
Round 2 94 85 89 89 81 76 107 89 87 91 88 80

True or False

Question 11 (1 point)

The sample size is sufficiently large to use the normal approximation of the test statistic.

True.

We have a matched data set. We can note that for each round, we will obtain non-zero differences across all members since there is no set of values which are equal to one another across both rounds. This means that we will have \(n=12\) differences. Since \(n>10\), we can approximate the test statistic with a normal distribution.

Note

It is necessary that, in the final analysis, we end up with the number of differences being greater than \(10\). If we have more than \(10\) pairs in which there are \(0\) differences in such a way that we end up with \(n<10\) differences, we cannot consider a normal approximation.

Question 12 (1 point)

This test assumes that the populations are symmetric around their medians.

True.

MCQ

Question 13

Based on the results of the test, choose the correct statement.

  1. There is no evidence against the null hypothesis since the test statistic is smaller than the critical value, suggesting that golfers do not perform better after completing one round.
  2. There is no evidence against the null hypothesis since the test statistic is greater than the critical value, suggesting that golfers do perform better after completing one round.
  3. There is evidence against the null hypothesis since the test statistic is greater than the critical value, suggesting that golfers do perform better after completing one round.
  4. There is evidence against the null hypothesis since the test statistic is smaller than the critical value, suggesting that golfers do not perform better after completing one round.

We have to calculate the test statistic and the critical region.

Round 1 89 90 87 95 86 81 102 105 83 88 91 79
Round 2 94 85 89 89 81 76 107 89 87 91 88 80
Differences -5 5 -2 6 5 5 -5 16 -4 -3 3 -1
Absolute Differences 5 5 2 6 5 5 5 16 4 3 3 1
Rank 8 8 2 11 8 8 8 12 5 3.5 3.5 1
Signed Ranks -8 8 -2 11 8 8 -8 23 -5 -3.5 3.5 -1

We can therefore calculate the test statistic as \(W=\text{sum of the signed ranks}=34\). Then, the corresponding \(z\)-score is

\[ z=\frac{34-0}{\sqrt{\frac{12(12+1)(2(12)+1)}{6}}}\approx1.33 \]

The critical region, at the \(1\%\) significance level is

#####################
# CRITICAL VALUE
####################

z <- qnorm(0.01, lower.tail=F)
z
[1] 2.326348

\(z<-2.33\) since we are considering a left-tailed test. We expect that \(\text{Round }1-\text{Round }2<0\), given the way we did our differences.

As a result, we fail to reject the null hypothesis at the \(1\%\) significance level. We, therefore conclude that there is no significant evidence of better performance after the first round. We choose A.

Note

The question in the MCQ expects a right-tailed test. Since we did a left-tailed test here, you just have to think about how hypothesis would flip if we had done the test the other way around.

Question 14 (2 points)

Which of the following correctly states the null and alternative hypotheses for this test?

  1. Null hypothesis (H₀): The median score in Round 1 is equal to the median score in Round 2. Golfers perform the same regardless of whether they have completed a round first.

    Alternative hypothesis (H₁): The median score in Round 2 is greater than the median score in Round 1. Golfers performed better in Round 2.

  2. Null hypothesis (H₀): The median difference between scores in Round 1 and 2 is zero.

    Alternative hypothesis (H₁): The median difference between scores in Round 1 and 2 is greater than zero, implying that golfers perform better after completing one round.

  3. Null hypothesis (H₀): The distribution of scores in Round 1 is the same as in Round 2. Golfers perform the same regardless of whether they have completed a round first.

    Alternative hypothesis (H₁): Scores in Round 2 tend to be lower than in Round 1, they differ in terms of their median only. In other words, golfers performed worse in Round 2.

  4. Null hypothesis (H₀): The median difference between scores in Round 1 and 2 is zero.

    Alternative hypothesis (H₁): The median difference between scores in Round 1 and 2 is less than zero, implying that golfers perform better after completing one round.

The wording is a little bit fuzzy here. But, we choose D as an answer since this is consistent with the left-tailed test we did. B also sounds valid, if you consider a right-tailed test.

Numeric Questions

Question 15 (1 point)

What is the sample size used to calculate the test statistic? Enter your answer as a whole number.

12

Question 16 (1 point)

Conduct the test manually. What is the final value of the test statistic used to draw a conclusion? 

1.33

Now, I’m not too sure what this question is really asking. However, I assume that it is talking about the \(z\) score since we directly use it to make a conclusion. The \(W\) statistic is an indirect measure.

Question 17 (1 point)

What is the absolute value of the critical value for this test?

2.33

Question 18 (1 point)

Conduct the test in R in-built function. What is the p-value associated with the test statistic?

##################################
# WILCOXON SIGNED RANK SUM TEST
##################################

# given samples
rd_one <- c(89, 90, 87, 95, 86, 81, 102, 105, 83, 88, 91, 79)
rd_two <- c(94, 85, 89, 89, 81, 76, 107, 89, 87, 91, 88, 80)

# test

wilcox.test(rd_two, rd_one, paired=TRUE, exact=FALSE, alternative="greater")

    Wilcoxon signed rank test with continuity correction

data:  rd_two and rd_one
V = 27.5, p-value = 0.8287
alternative hypothesis: true location shift is greater than 0

0.8287

Question 19 (1 point)

What is the mean of the test statistic under the null hypothesis? Enter your answer as a whole number.

0

Question 20 (1 point)

What is the absolute value of the difference for the first golfer? Enter your answer as a whole number.

5

Scenario 3

Question 21 (2 points)

You are analysing students’ satisfaction ratings (on a 1–5 Likert scale) for three different teaching methods. The goal is to determine whether satisfaction levels differ across the teaching methods. The data are collected from different groups of students, each exposed to only one teaching method. Which statistical test would be the most appropriate to analyse this data?

  1. Paired t-test, since the groups are related, and the data are quantitative.
  2. One-way ANOVA, since there are more than two groups, and the response is numerical.
  3. Kruskal–Wallis test, since the data are ordinal and there are more than two independent groups.
  4. Wilcoxon signed-rank test, since the data are ordinal and paired across groups.

We have ordinal data. This rules out a one-way ANOVA and a paired \(t\)-test since these only deal with quantitative data. It cannot be a Wilcoxon signed rank sum test because we do not have paired samples. So, we it must be a Kruskal-Wallis test.

Question 22 (2 point)

A psychologist is studying test anxiety levels in students from two different universities. Each student is asked to rate their anxiety on a scale from 1 (no anxiety) to 5 (extreme anxiety). The students from the two universities form separate, unrelated groups, and the ratings are ordinal. Which statistical test is most appropriate to determine whether anxiety levels differ between the two universities?

  1. Wilcoxon signed-rank test, because there are two groups, and the data are not normally distributed.
  2. Kruskal–Wallis test, because the data are ordinal, and the groups are unrelated.
  3. Paired t-test, because the data are ordinal and come from two groups.
  4. Mann–Whitney U test, because the data are ordinal and come from two independent samples.

We again have ordinal data, and so this rules out a paired \(t\)-test. In any case, the justification for a paired \(t\)-test is incorrect. The samples are independent as stated in the problem. This rules out a Wilcoxon-Signed Rank Sum Test. A Mann-Whitney U Test seems appropriate here since we have two independent groups for which we are trying to determine a difference in a dependent variable (being the anxiety levels in students from two different universities). So, we choose D.