1. Comparing two related samples

ONE

A teacher wished to determine if providing a bilingual dictionary to students with limited English proficiency improves math test scores. A small class of students (n = 10) was selected. Students were given two math tests. Each test covered the same type of math content; however, students were provided a bilingual dictionary on the second test. The data in Table 3.10 represent the students’ performance on each math test.

Table 3.10
student	Math test without a billingual dictionary	Math test with a billingual dictionary
1	30	39
2	56	46
3	48	37
4	47	44
5	43	32
6	45	39
7	36	41
8	44	40
9	44	38
10	40	46

Use a one-tailed Wilcoxon signed rank test and a one-tailed sign test to determine which testing condition resulted in higher scores. Use $\alpha = 0.05$. Report your findings

\[{H}_{0}:\text{The billingual dictionary has no effect on math test scores}\\ \\ {H}_{1}:\text{Math test scores are higher without the billingual dictionary}\]

## One-tailed Wilcoxon signed-rank test
## ---------------------------------------
## p-value: 0.1099
## Decision: Fail to reject the null hypothesis
## 
## One-tailed Sign Test
## --------------------------
## p-value: 0.1719
## Decision: Fail to reject the null hypothesis

A one-tailed Wilcoxon signed-rank test and a one-tailed sign test were conducted to determine whether students performed better on the math test without a bilingual dictionary, with $\alpha = 0.05$ and $n = 10$. Median scores were $44.0 \text{ (without dictionary) } \text{ and } 39.5 \text{ (with dictionary) }$. For the Wilcoxon signed-rank test, the result was non-significant $(V = 40, p = 0.1099)$. The sign test yielded 7 positive differences and 3 negative differences, also non-significant $(p = 0.1719)$. Both tests fail to reject ${H}_{0}$ — there is insufficient evidence to conclude that students scored higher without the bilingual dictionary.

TWO

A research study was done to investigate the influence of being alone at night on the human male heart rate.Two mean were sent into a wooded area, one at a time, at night, for 20 minutes. They had a heart monitor to record their pulse rate. The second night, the same men were sent into a similar wooded area accompanied by a companion. Thei pulse rate was recorded again. The researcher wanted to see if having a companion would change their pulse rate. The median pulse rates are recorded in Table 3.11. Use a two-tailed Wilcoxon signed-rank test and a two-tailed signed test to determine which condition produced a higher pulse rate. Use $\alpha = 0.05$. Report your findings

Table 3.11
participant	median rate alone	median rate with companion
A	88	72
B	77	74
C	91	80
D	70	77
E	80	71
F	85	83
G	90	80
H	82	91
I	93	86
J	75	69

\[{H}_{0}:\text{There is no diference in pulse rate between being alone and having a companion}\\ \\ {H}_{1}:\text{There is a difference in pulse rate between the two conditions}\]

## Two-tailed Wilcoxon signed-rank test
## ---------------------------------------
## p-value: 0.1025
## Decision: Fail to reject the null hypothesis
## 
## Two-tailed Sign Test
## --------------------------
## p-value: 0.1094
## Decision: Fail to reject the null hypothesis

Two-tailed Wilcoxon signed-rank and sign tests were conducted to assess whether having a companion affected heart rate among $n = 10$ male participants, with $\alpha = 0.05$. Median pulse rates were $83.5 \text{ bpm (alone) and }78.5 \text{ bpm (with companion) }$. The Wilcoxon signed-rank test was non-significant $(V = 44, p = 0.1025)$. The sign test recorded 8 positive and 2 negative differences, also non-significant $(p = 0.1094)$. Both tests fail to reject ${H}_{0}$ — the data provides insufficient evidence that suggests a difference in pulse rate between the two conditions.

THREE

A researcher conducts a pilot study to compare two treatments to help obese female teenagers lose weight. She tests each individual in two different treatment conditions. The data in Table 3.12 provides the number of pounds that each participant lost.

Table 3.12
participant	treatment 1	treatment 2
1	10	18
2	20	12
3	15	16
4	9	7
5	18	21
6	11	17
7	6	13
8	12	14

Use a two-tailed Wilcoxon signed-rank test and a two-tailed sign test to determine which treatment resulted in greater weight loss. Use $\alpha = 0.05$. Report your findings

\[{H}_{0}:\text{There is no difference in weight loss between treatment 1 and 2}\\ \\ {H}_{1}:\text{There is a difference in weight loss between the two treatments}\]

## Two-tailed Wilcoxon signed-rank test
## ---------------------------------------
## p-value: 0.2924
## Decision: Fail to reject the null hypothesis
## 
## Two-tailed Sign Test
## --------------------------
## p-value: 0.2891
## Decision: Fail to reject the null hypothesis

Two-tailed Wilcoxon signed-rank and sign tests were applied to compare weight loss outcomes between two treatments among $n = 8$ female participants, with $\alpha = 0.05$. Median weight loss was $11.5 \text{ lbs under treatment 1 and }15.0 \text{ lbs under treatment 2 }$. The Wilcoxon signed-rank test was non-significant $(V = 10, p = 0.2924)$. The sign test yielded only 2 positive differences and 6 negative differences, also non-significant $(p = 0.2891)$. Both tests fail to reject ${H}_{0}$- there is no statistically significant difference in weight loss between the two treatments, although the descriptive pattern leans in favour of treatment 2.

FOUR

Twenty participants in an exercise program were measured on the number of sit-ups they could do before physical exercise (first count) and the number they could do after they had done at least 45 minutes of other physical exercise.(second count) Table 3.13 shows the results for 20 participants obtained during two separate physical exercise sessions. Determine the Effect Size for a calculated z-score

## Effect size: 0.4591

This constitutes a medium-to-large effect, indicating that the exercise session had a practically meaningful impact on sit-up performance.

FIVE

A school is trying to get more students to participate in activities that will make learning more desirable. Table 3.14 shows the number of activities that each of the 10 students in one class participated in last year before a new activity program was implemented and this year after it was implemented. Construct a 95% confidence interval based on the Wilcoxon signed-rank test to determine whether the new activity program had a significant positive effect on the student participation

\[{H}_{0}:\text{The new activity program has no effect on student participation} \\ \\ {H}_{1}:\text{The new activity program has a significant positive effect on student participation}\]

## [-6.9999,1]

Since the above interval contains zero, we fail to reject ${H}_{0}$— there is insufficient evidence at $\alpha = 0.05$ to conclude that the new activity program had a significant positive effect on student participation.

SIX

Samples of cream from each of ten dairies (A to J) are each divided into two portions. One portion from each is sent to laboratory I, the other to Laboratory II, for bacterium counts. The counts (thousands bacteria ${ml}^{-1}$) are:

Dairy	Lab I	Lab II
A	11.7	10.9
B	12.1	11.9
C	13.3	13.4
D	15.1	15.4
E	15.9	14.8
F	15.3	14.8
G	11.9	12.3
H	16.2	15.0
I	15.1	14.2
J	13.6	13.1

Use the Wilcoxon signed-rank test to assess the evidence for any consistent difference between laboratories for subsamples from the same dairy. Obtain also 95 and 99 percent confidence intervals for the mean and compare these with the intervals using the optimal method when normality is assumed

\[{H}_{0}:\text{There is no consisitent difference in bacteria counts between laboratory I and laboratory II}\\ \\ {H}_{1}:\text{There is a consistent difference in bacteria counts between the two laboratories}\]

## Two-tailed Wilcoxon signed-rank test
## ------------------------------------------------
## p-value: 0.0526
## Decision: Fail to reject the null hypothesis

A two-tailed Wilcoxon signed-rank test was conducted to assess whether a consistent difference existed between bacteria counts $(\text{thousands }{ml}^{-1})$ recorded by two laboratories across $n = 10$ dairy subsamples, with $\alpha = 0.05$ . Median counts were $14.35 \text{ (Lab I) and }13.80\text{ (Lab II) }$. The test was marginally non-significant $(V = 47, p = 0.0526)$. We fail to reject ${H}_{0}$- the evidence for a consistent laboratory difference falls just short of the $5\%$ threshold. The Hodges-Lehmann estimate of the location shift was $0.450\text{ thousand bacteria } {ml}^{−1}$

The nonparametric 95% confidence interval for the difference was $[-0.05,0.8999]$ and the 99% confidence interval was $[-0.35,1.15]$, both straddling zero and consistent with the test decision

SEVEN

For each of nine matched pairs of students, one student is allocated to a series of lectures and the other to appropriate computer assisted learning (CAL) material. At the end of the course, the students are given the same examination paper. The marks achieved (out of 100) are:

pair	CAL	lectures
1	50	25
2	56	58
3	51	65
4	46	38
5	88	91
6	79	32
7	81	31
8	95	13
9	73	49

\[{H}_{0}:\text{There is no difference in examination performance between CAL and lecture-based learning}\\ \\ {H}_{1}:\text{There is a difference in examination performance between the two methods}\]

Analyze these results by what you consider the most appropriate parametric or nonparametric methods to determine whether or not they provide acceptable evidence that CAL materials lead to better examinations results

## Two-tailed Wilcoxon signed-rank test
## ------------------------------------------------
## p-value: 0.0742
## Decision: Fail to reject the null hypothesis

A two-tailed Wilcoxon signed-rank test was used to compare examination marks (out of 100) between students assigned to computer-assisted learning (CAL) and those assigned to lectures, across $n = 9$ matched pairs, with $\alpha = 0.05$ . Median marks were $73\text{ (CAL) and }38\text{ (lectures) }$. Of the $9$ pairs, $6$ showed higher marks under CAL and $3$ under lectures. Despite this descriptive advantage, the test was non-significant $(V = 38, p = 0.0742)$. We fail to reject ${H}_{0}$— there is insufficient evidence at the $5\%$ level to conclude that CAL and lecture-based learning produce different examination outcomes.

2. Comparing two unrelated samples

ONE

The data in Table 4.8 were obtained from a reading-level test for 1st grade children. Compare the performance gains of the two different methods for teaching reading.

Table 4.8
Method	Gain score	Method	Gain score
One on one	16	Small group	11
One on one	13	Small group	2
One on one	16	Small group	10
One on one	16	Small group	4
One on one	13	Small group	9
One on one	9	Small group	8
One on one	12	Small group	5
One on one	12	Small group	6
One on one	20	Small group	4
One on one	17	Small group	16

Use two-tailed Mann-Whitney U and Kolmogorov-Smirnov two-sample tests to determine which method was better for teaching reading. Set $\alpha = 0.05$. Report your findings

\[{H}_{0}:\text{There is no difference in reading gain scores between one-on-one and small group instruction}\\ \\ {H}_{1}:\text{There is a difference in reading gain scores between the two methods}\]

## Two-tailed Mann-Whitney U test
## ---------------------------------------
## p-value: 0.0021
## Decision: Reject the null hypothesis
## 
## Kolmogorov-Smirnov two-sample test
## ------------------------------------
## p-value: 0.0021
## Decision: Reject the null hypothesis

Two-tailed Mann-Whitney U and Kolmogorov-Smirnov two-sample tests were conducted to compare reading gain scores between one-on-one and small group instruction among ${n}_{1}={n}_{2}=10$ first-grade children, with $\alpha = 0.05$. Median gain scores were $14.5\text{ (one-on-one) and }7.0\text{ (small group). }$ Both tests were significant — the MWU test $(W = 91, p = 0.0021)$ and the KS test $(D = 0.8, p = 0.0021)$. We reject the null hypothesis under both tests. There is strong evidence that one-on-one instruction produces higher reading gain scores than small group instruction.

TWO

Table 4.10
Method 1	Method 2
53	91
41	18
17	14
45	21
44	23
12	99
49	16
50	10

Table 4.10 shows assessment scores of two different classes who are being taught computer skills using two different methods

Use two-tailed Mann-Whitney U and Kolmogorov-Smirnov two-sample tests to determine which method was better for teaching compute skills. Set $\alpha = 0.05$. Report your findings

\[{H}_{0}:\text{There is no difference in computer skills assessment scoresbetween method 1 and method 2}\\ \\ {H}_{1}:\text{There is a difference in assessment scores between the two methods}\]

## Two-tailed Mann-Whitney U test
## ---------------------------------------
## p-value: 0.4418
## Decision: Fail to reject the null hypothesis
## 
## Kolmogorov-Smirnov two-sample test
## ------------------------------------
## p-value: 0.2827
## Decision: Fail to reject the null hypothesis

Two-tailed Mann-Whitney U and Kolmogorov-Smirnov tests were applied to compare computer skills assessment scores between two teaching methods $({n}_{1}={n}_{2}=8)$, with $\alpha = 0.05$ . Median scores were $44.5\text{ (method 1) and }19.5\text{ (method 2) }$. Despite the descriptive difference, neither test was significant — MWU $(W = 40, p = 0.4418)$ and KS $(D = 0.5, p = 0.2827)$. Both tests fail to reject ${H}_{0}$. The high variability within each group, particularly the extreme scores of 91 and 99 in method 2, inflates within-group spread and reduces power. The data does not provide sufficient evidence to conclude that one teaching method is superior.

THREE

Two methods were used to provide instruction in Science for 7th Grade. Method 1 included a laboratory each week and method 2 has only classroom work with lecture and worksheets. Table 4.12 shows end-of-term test performance for the two methods. Construct a 95% median confidence interval based on the difference between two independent samples to compare the two methods.

Table 4.12
Method 1	Method 2
15	8
23	15
9	10
12	13
18	17
22	5
17	18
20	7

\[{H}_{0}:\text{There is no difference in end-of-term test performance between laboratory and classroom instruction}\\ \\ {H}_{1}:\text{There is a difference in end-of-term test performance between the two methods}\]

## 95% confidence interval between the scores of the two methods:
## [0, 12]

A two-tailed Mann-Whitney U test with a 95% confidence interval was conducted to compare end-of-term science scores between laboratory-based (method 1) and classroom-based (method 2) instruction, with ${n}_{1}={n}_{2}=8$ and $\alpha = 0.05$. Median scores were $17.5\text{ (method 1) and }11.5\text{ (method 2) }$. The test was marginally non-significant $(W = 50.5, p = 0.0582)$. We fail to reject ${H}_{0}$ at the $5\%$ level. The Hodges-Lehmann estimate of the median difference was $5.0$ marks in favour of method 1, with a $95\%$ CI of $[0, 12]$. The lower bound of zero indicates that the interval just touches the null value, consistent with the borderline p-value. While the evidence leans toward a method 1 advantage, the small sample size limits power and no definitive conclusion can be drawn.

FOUR

An alloy is composed of zinc, copper and tin. It may be made at one of two temperatures H (higher) or L (lower). We wish to know if one temperature produces a harder alloy. A sample is taken from each of 9 batches at L and 7 at H. To arrange them in ascending order of hardness, all specimens are scraped against one another to see which makes a deeper scratch (a deeper scratch indicates a softer specimen). On this basis the specimens are ranked 1 (softest) to 16 (hardest) with the results given below. Should we reject the hypothesis that hardness is unaffected by temperature?

rank	temperature
1	H
2	L
3	H
4	H
5	H
6	L
7	H
8	L
9	L
10	H
11	H
12	L
13	L
14	L
15	L
16	L

\[{H}_{0}:\text{Alloy hardness is unaffected by production temperature}\\ \\ {H}_{1}:\text{Alloy hardness differs between the production temperature}\]

## Mann-Whitney U test
## ---------------------
## p-value: 0.0549
## Decision: Fail to reject the null hypothesis

A two-tailed Mann-Whitney U test with exact p-value computation was used to determine whether production temperature affected alloy hardness, with ${n}_{H}=7$ batches at higher temperature and ${n}_{L}=9$ at lower temperature, and $\alpha = 0.05$. Specimens were ranked 1 (softest) to 16 (hardest). Rank sums were $41\text{ (H) and }95\text{ (L), with median ranks of }5\text{ (H) and }12\text{ (L), }$suggesting that lower temperature batches tended to produce harder alloys. However, the test was marginally non-significant $(W = 13, p = 0.0549)$. We fail to reject ${H}_{0}$ — there is insufficient evidence at the $5\%$ level to conclude that temperature affects hardness, though the result is borderline and warrants caution. A larger sample would help clarify whether the observed ranking pattern reflects a true population difference.

FIVE

A psychologist notes total time (in seconds) needed to perform a series of simple manual tasks for each of eight children with learning difficulties and seven children without learning difficulties. The times are:

without difficulties	with difficulties
204	243
218	228
197	261
183	202
227	343
233	242
191	220
	239

Use a Smirnov test to find whether the psychologist is justified in asserting these samples are likely to be from different populations.

\[{H}_{0}:\text{Children with and without learning difficulties are drawn from the same population}\\ \\ {H}_{1}:\text{The two groups are drawn from the same population}\]

## Mann-Whitney U test
## ---------------------
## p-value: 0.0559
## Decision: Fail to reject the null hypothesis

A two-sample Kolmogorov-Smirnov (Smirnov) test with exact p-value computation was conducted to determine whether task completion times differed between ${n}_{1}=8$ children with learning difficulties and ${n}_{2}=7$ children without, with $\alpha = 0.05$. Median times were $240.5$ seconds (with difficulties) and $204.0$ seconds (without difficulties), suggesting the group with learning difficulties tended to take longer. The test was marginally non-significant $(D = 0.625, p = 0.0559)$. We fail to reject ${H}_{0}$ at the $5\%$ level — the data does not provide sufficient evidence to justify the psychologist’s assertion that the two samples are from different populations, though the result is borderline and the descriptive pattern is consistent with the hypothesis

SIX

The following data are DMF scores for 34 male and 54 female first-year dental students. The DMF score is the total of the numbers of decayed + missing + filled teeth.

gender	DMF
male	8
male	6
male	4
male	2
male	10
male	5
male	6
male	6
male	19
male	4
male	10
male	4
male	10
male	12
male	7
male	2
male	5
male	1
male	8
male	2
male	0
male	7
male	6
male	4
male	4
male	11
male	2
male	16
male	8
male	7
male	8
male	4
male	0
male	2
female	4
female	7
female	13
female	8
female	8
female	4
female	14
female	5
female	6
female	4
female	12
female	9
female	9
female	9
female	8
female	12
female	4
female	8
female	8
female	4
female	11
female	6
female	15
female	9
female	8
female	14
female	9
female	8
female	9
female	7
female	12
female	11
female	7
female	4
female	10
female	7
female	8
female	8
female	7
female	9
female	10
female	16
female	14
female	15
female	10
female	4
female	6
female	3
female	9
female	3
female	10
female	3
female	8

Use an asymptotic WMW test to determine whether the DMF score differs significantly between males and females.

\[{H}_{0}:\text{DMF scores are identically distributed between male and female first year dental students}\\ \\ {H}_{1}:\text{DMF scores differ between male and female first-year students}\]

## Mann-Whitney U test
## ---------------------
## p-value: 0.0045
## Decision: Reject the null hypothesis

An asymptotic Wilcoxon-Mann-Whitney test (normal approximation) was applied to compare DMF scores between ${n}_{male}=34 \text{ and } {n}_{f}=54$ female first-year dental students, $\alpha = 0.05$. The asymptotic form was appropriate here given the large combined sample size $(N=88)$ and the presence of ties, which preclude exact computation. Median DMF scores were $6$ (male) and $8$ (female), with means of $6.18$ and $8.33$ respectively, suggesting females tended to have higher DMF scores.The test was significant $(W = 587.5, p = 0.0045). We reject the null hypothesis. DMF scores differ significantly between male and female dental students, with females exhibiting a higher burden of decayed, missing and filled teeth

3. Comparing more than two related samples

ONE

A graduate student performed a pilot study for his dissertation. He wanted to examine the effects of animal companionship on elderly males. He selected 10 male participants from a nursing home. Then he used an ABAB research design, where A represented a week with the absence of a cat and B represented a week with the presence of a cat. At the end of each week, he administered a 20-point survey to measure the quality of life satisfaction. The survey results are presented in Table 5.9

Table 5.9
participant	week 1	week 2	week 3	week 4
1	7	6	8	9
2	9	8	10	1
3	15	18	16	17
4	7	6	8	9
5	7	8	10	11
6	10	14	13	11
7	12	19	11	13
8	7	4	2	5
9	8	7	9	5
10	12	16	14	15

Use a Friedman test to determine if one or more of the groups are significantly different. Since this is a pilot study, use $\alpha = 0.10$. If a significant difference exists, use Wilcoxon signed-rank tests to identify which groups are significantly different. Use the Bonferroni procedure to limit the type I error rate. Report your findings

\[{H}_{0}:\text{The quality of life satisfaction scores are identical across all four weeks} \\ {H}_{1}:\text{At least one week differs in quality of life satisfaction}\]

## Friedman test
## ------------------------------------------------
## p-value: 0.5399
## Decision: Fail to reject the null hypothesis

A Friedman test was conducted to examine the effect of animal companionship (ABAB design) on quality of life satisfaction among $n=10$ elderly male participants across four weeks, with $\alpha = 0.05$. Median scores were $8.5, 8.0, 10.0, \text{ and } 10.0$ for weeks 1 through 4 respectively. The omnibus test was non-significant $({\chi^2}_{(3)}=2.16, p = 0.5399)$ .We therefore fail to rejct ${H}_{0}.$ The data provides insufficient evidence of a significant difference in quality of life satisfaction across the four weeks. Since the Friedman test was non-significant, post-hoc Wilcoxon signed-rank comparisons were not warranted, even at $\alpha = 0.10$

TWO

A physical education teacher conducted an action research project to examine a strength and conditioning program. Using 12 male participants, she measures the number of curl ups they could do in 1 minute. She measured their performance before the program. Then, she measured their performance at 1 month intervals. Table 5.10 presents the performance results.

Table 5.10: Number of curl ups in a minute
participant	baseline	month 1	month 2
1	66	67	69
2	49	50	56
3	51	52	49
4	65	65	69
5	42	43	46
6	38	39	40
7	33	31	39
8	41	41	44
9	46	47	48
10	45	46	46
11	36	33	34
12	51	55	67

\[{H}_{0}:\text{The number of curl-ups is identical across all three time points} \\ \\ {H}_{1}:\text{Performance increases over time}\]

Use a Friedman test with $\alpha = 0.05$ to determine if one or more of the groups are significantly different. The teacher is expecting performance gains, so if a significant difference exists, use one-tailed Wilcoxon signed-rank tests to identify which groups are significantly different. Use the Bonferroni procedure to limit the type I error rate. Report your findings.

## Friedman test
## ------------------------------------------------
## p-value: 0.0041
## Decision: Reject the null hypothesis

A Friedman test was used to evaluate the effect of a strength and conditioning program on curl-up performance across three time points (baseline, month 1, month 2) for $n = 12$ male participants, with $\alpha = 0.05$. Median scores were $45.5, 46.5, \text{ and } 47.0$ respectively, suggesting a progressive increase. The Friedman test was significant $({\chi^2}_{(2)}=10.9778, p = 0.0041)$ indicating at least one time point differed.

We therefore proceed to conduct post-hoc Wilcoxon signed-rank tests to establish the origin of the difference. Below are the post-hoc test results determined via the Bonferroni p-value $0.05/3=0.0167$;

## Bonferroni-Wilcoxon Signed Rank test (baseline vs month 1)
## ------------------------------------------------------------
## p-value: 0.1449
## Decision: Fail to reject the null hypothesis
## 
## Bonferroni-Wilcoxon Signed Rank test (baseline vs month 2)
## -------------------------------------------------------------
## p-value: 0.0065
## Decision: Reject the null hypothesis
## 
## Bonferroni-Wilcoxon Signed Rank test (month 1 vs month 2)
## -------------------------------------------------------------
## p-value: 0.009
## Decision: Reject the null hypothesis

The comparison of baseline vs month 1 was non-significant $(W = 17, p = 0.1449)$. However, baseline vs month 1 was significant $(W = 7, p = 0.0065)$, as was month 1 vs month 2 $(W = 6, p = 0.009)$. The conditioning program produced a significant improvement in curl-up performance by month 2, with meaningful gains observed between month 1 and month 2 as well, but no detectable change from baseline to month 1.

THREE

At the beginning of a session, 12 names are read out in random order to 10 students. Four are names of prominent sporting personalities (Group A), four of national and international politicians (Group B) and four of local dignitaries (Group C). At the end of the session students are asked to recall as many of the names as possible. The numbers recalled were:

student	group A	group B	group C
I	3	2	0
II	1	1	0
III	2	3	1
IV	4	3	2
V	3	2	2
VI	1	0	0
VII	3	2	4
VIII	3	2	1
IX	2	2	0
X	4	3	2

Rank the data within each block (student) and use a Friedman test to assess evidence of a difference between recall rates for the three groups. In particular, is the recall rate for group B and/or group C significantly lower than that for group A? Carry out an ANOVA on the given data. Do the conclusions agree with the Friedman test? If not, why not?

\[{H}_{0}:\text{Recall rates are identical across all three name categories}\\ \\ {H}_{1}:\text{At least one category has a different recall rate}\]

## Friedman Test
## -----------------
## p-value: 0.0043
## Decision: Reject the null hypothesis

A Friedman test was performed to assess differences in recall rates across three name categories — sporting personalities (A), politicians (B), and local dignitaries (C) — among $ n=10 $ students, with $\alpha = 0.05$. Median recall counts were $3, 2, \text{ and } 1$ for groups A, B, and C respectively. We rejected the null hypothesis thus, test was significant $({\chi^2}_{(2)}=10.8888889, p = 0.0043)$.

One-way ANOVA

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## grp          2  9.867   4.933   9.380 0.00162 **
## student      9 24.533   2.726   5.183 0.00149 **
## Residuals   18  9.467   0.526                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A one-way ANOVA with student as a blocking factor also yielded a significant group effect $(F=9.38, p = 0.002)$, consistent with the Friedman result. The two tests agree here because the data, while discrete and bounded, does not show severe departures from the ANOVA assumptions at this sample size.

Post-hoc Bonferroni Wilcoxon Signed Rank tests

## Bonferroni-Wilcoxon Signed Rank test (group A vs group B)
## ------------------------------------------------------------
## p-value: 0.0411
## Decision: Fail to reject the null hypothesis
## 
## Bonferroni-Wilcoxon Signed Rank test (group A vs group C)
## -------------------------------------------------------------
## p-value: 0.0126
## Decision: Reject the null hypothesis
## 
## Bonferroni-Wilcoxon Signed Rank test (group B vs group C)
## -------------------------------------------------------------
## p-value: 0.1142
## Decision: Fail to reject the null hypothesis

The comparison A vs B was non-significant $(V = 31.5, p = 0.0411)$ falling just above the adjusted threshold. The comparison A vs C was significant $(V = 52, p = 0.0126)$, confirming that group A was recalled significantly more than group C. The comparison B vs C was non-significant $(V = 29.5, p = 0.1142)$. Sporting personalities were recalled at a significantly higher rate than local dignitaries, though no statistically significant difference was established between politicians and sporting personalities or between politicians and local dignitaries after Bonferroni correction

FOUR

Four share tipsters are each asked to predict on 10 randomly selected days whether the London FTSE Index (commonly known as Footsie) will rise or fall on the following day. If they predict correctly this is scored as 1, if incorrectly as 0. Do the scores below indicate differences in tipsters’ ability to predict accurately?

day	tipster 1	tipster 2	tipster 3	tipster 4
1	1	1	1	1
2	0	1	1	1
3	0	1	0	0
4	1	1	1	0
5	1	0	1	0
6	1	1	1	1
7	1	1	1	1
8	0	0	1	1
9	1	0	0	0
10	1	0	1	1

\[{H}_{0}:\text{There is no difference in prediction accuracy among the four tipsters} \\ \\ {H}_{1}:\text{At least on tipster differs in in prediction theory}\]

## Cochran's Q test
## ------------------
## p-value: 0.6974
## Decision: Fail to reject the null hypothesis

Cochran’s Q test was applied to binary prediction outcomes (1 = correct, 0 = incorrect) from four share tipsters over 10 days, with $\alpha = 0.05$ . Correct prediction counts were $7, 6, 8, \text{ and } 6$ out of $10$ for tipsters 1 through 4 respectively. The test was non-significant $({Q}_{(3)} = 1.4347826,p = 0.6974$ .We fail to reject ${H}_{0}$ — there is no sufficient evidence of a difference in prediction accuracy among the four tipsters

FIVE

Berry (1987) gives the following data for numbers of premature ventricular contractions per hour for 12 patients with cardiac arrhythmias when each is treated with 3 drugs A, B, C

patient	drug A	drug B	drug C
1	170	7	0
2	19	1.4	6
3	187	205	18
4	10	0.3	1
5	216	0.2	22
6	49	33	30
7	7	37	3
8	474	9	5
9	0.4	0.6	0
10	1.4	63	36
11	27	145	26
12	29	0	0

Use a Friedman test to investigate differences in response between drugs. In particular, is there evidence of a difference between drug A and drug B?

\[{H}_{0}:\text{The distribution of premature ventricular contractions is identical across drugs A,B and C} \\ \\ {H}_{1}:\text{At least one drug produces a different contraction rate}\]

## Friedman test
## ------------------
## p-value: 0.0179
## Decision: Reject the null hypothesis

A Friedman test was conducted to investigate differences in premature ventricular contractions per hour among $12$ cardiac arrhythmia patients treated with three drugs (A, B, C), with $\alpha = 0.05$. Median contraction counts were $28, 8, \text{ and } 5.5$ for drugs A, B, and C respectively. The omnibus test was significant $({\chi^2}_{(2)} = 0.0179, p = 0.018)$. We reject the null hypothesis, concluding that drug treatment has a significant effect on the rate of ventricular contraction per hour.

SIX

Cohen (1983) gives data for numbers of births in Israel for each day in 1975. We give below data for numbers of births on each day in the 10th, 20th, 30th and 40th weeks of the year

week	day of the week	number of births
10	Monday	108
10	Tuesday	106
10	Wednesday	100
10	Thurdsay	85
10	Friday	85
10	Saturday	92
10	Sunday	96
20	Monday	82
20	Tuesday	99
20	Wednesday	89
20	Thurdsay	125
20	Friday	74
20	Saturday	85
20	Sunday	100
30	Monday	96
30	Tuesday	101
30	Wednesday	108
30	Thursday	103
30	Friday	108
30	Saturday	96
30	Sunday	110
40	Monday	124
40	Tuesday	106
40	Wednesday	111
40	Thursday	115
40	Friday	99
40	Saturday	96
40	Sunday	111

Perform Friedman analyses to determine whether the data indicate a difference in birth rate between days of the week that shows consistency over the four selected weeks

\[{H}_{0}:\text{The distribution of daily bith counts is consistent across the four selected weeks} \\ \\ {H}_{1}:\text{At least one week shows a different pattern of daily births}\]

## Friedman test
## ------------------
## p-value: 0.02
## Decision: Reject the null hypothesis

A Friedman test was used to assess whether birth rates across days of the week were consistent over four selected weeks (weeks 10, 20, 30, 40) in Israel 1975, with $\alpha = 0.05$. The test was significant $({\chi^2}_{(3)}=9.8382353, p = 0.02)$. We reject ${H}_{0}$ .The data indicates that the pattern of daily births was inconsistent across the four selected weeks, suggesting week-to-week variation in the distribution of births by day.

SEVEN

Snee (1985) gives data on average liver weights per bird for chicks given three levels of growth promoter (none, low, high). Blocks correspond to different bird houses. Use a Friedman test to see if there is evidence of an effect of growth promoter

house	none	low	high
1	3.93	3.99	4.08
2	3.78	3.96	3.94
3	3.88	3.96	4.02
4	3.93	4.03	4.06
5	3.84	4.10	3.94
6	3.75	4.02	4.09
7	3.98	4.06	4.17
8	3.84	3.92	4.12

\[{H}_{0}:\text{Average liver weight is identical across the three promoter dose levels}\\ \\ {H}_{1}:\text{At least one dose level produces a different liver weight}\]

## Friedman test
## ------------------
## p-value: 0.0015
## Decision: Reject the null hypothesis

A Friedman test was performed to assess the effect of growth promoter dose level (none, low, high) on average liver weight per chick across $n = 8$ bird houses, with $\alpha = 0.05$. Median liver weights were $3.860, 4.005, \text{ and } 4.070$ for none, low, and high doses respectively, suggesting a monotonic increase. The Friedman test was highly significant $({\chi^2}_{(2)}=13, p = 0.0015)$ .We reject ${H}_{0}$

Since dose levels are ordered, a Page test is appropriate. Try this also

\[{H}_{0}:\text{Liver weights are identically distriuted across dose levels}\\ \\ {H}_{1}:\text{Liver weights increase monotonically with dose level}\]

## Jonckheere-Terpstra test
## ------------------------------
## p-value: 3e-04
## Decision: Reject the null hypothesis

Since the dose levels are ordered, a Jonckheere-Terpstra directional analysis was also conducted. Pairwise results confirmed a significant increase from none to low $(W = 59, p = 0.003)$ and from none to high $(W = 62, p = 0.001). The comparison low to high was non-significant $(W = 43,p = 0.134)$, suggesting the most pronounced effect occurs between the no-dose and low-dose conditions, with diminishing gains at higher doses. Overall, the data provides strong evidence of a dose-ordered increase in average liver weight attributable to the growth promoter

4. Comparing more than two unrelated samples

ONE

A researcher conducted a study with n = 15 participants to investigate strength gains from exercise. The participants were divided into three groups and given one of three treatments. Participants’ strength gains were measured and ranked. The rankings are presented in Table 6.8 below

Table 6.8
I	II	III
7	13	12
2	1	5
4	7	16
11	8	9
15	3	14

Use a Kruskal-Wallis H test with $\alpha = 0.05$ to determine if one or more of the groups were significantly different. If a significant difference exists, use a two-tailed Mann-Whitney U test or a two-sample Kolmogorov-Smirnov test to identify which groups are significantly different. Use the Bonferroni procedure to limit the type I error rate. Report your findings

\[ {H}_{0}: \text{The perceived effectiveness rankings are identical across all three attractiveness groups}\\ \\ {H}_{1}:\text{At least one attractiveness group differs in perceived effectiveness rankings} \]

## Kruskal-Wallis Test
## ------------------------------------------------
## p-value: 0.2567
## Decision: Fail to reject the null hypothesis

A Kruskal-Wallis H test was conducted to compare strength gains across three treatment groups (I, II, and III) among $\mathit{n=15}$ participants (5 per group), with $\mathit{\alpha = 0.05}$. Strength gains were ranked globally, yielding rank sums of ${R}_{I}=36.5, {R}_{II}=30.5 \text{ and } {R}_{III}=53$. The test was $\mathit{non-significant } ({\chi^2}_{(2)}=2.72, p=0.257)$. We therefore fail to reject ${H}_{0}$ — there is insufficient evidence to conclude that any treatment group produced significantly different strength gains. Since the omnibus test was non-significant, post-hoc pairwise comparisons were not warranted; no further inference is made about individual group differences.

TWO

A researcher investigated how physical attraction influences the perception among others of a person’s effectiveness with difficult tasks. The photographs of 24 people were shown to a focus group. The group was asked to classify the photos into three groups: very attractive, average and very unattractive. Then, the group ranked the photographs according to their impressions of how capable they were of solving difficult problems. Table 6.9 shows the classification and rankings of the people in the photos(1 = most effective, 24 = least effective)

Table 6.9
very attractive	average	very unattractive
1	3	11
2	4	15
5	8	16
6	9	18
7	13	20
10	14	21
12	19	23
17	22	24

Use a Kruskal-Wallis H test with $\alpha = 0.05$ to determine if one or more of the groups are significantly different. If a significant difference exists, use two-tailed Mann-Whitney U tests to identify which groups are significantly different. Use the Bonferroni procedure to limit the type I error rate. Report you findings.

\[{H}_{0}: \text{The perceived effectiveness rankings are identical across all three attractiveness groups} \\ \\ {H}_{1}: \text{At least one attractiveness group differs in perceived effectiveness rankings}\]

## Kruskal-Wallis Test
## ------------------------------------------------
## p-value: 0.007
## Decision: Reject the null hypothesis

A Kruskal-Wallis H test was performed to compare perceived effectiveness rankings (1=most effective, 24=least effective) A Kruskal-Wallis H test was performed to compare perceived effectiveness rankings across three attractiveness classifications — very attractive, average, and very unattractive - among $n=24$ individuals, with $\alpha=0.05.$ Mean rankings were $7.5, 11.5 \text{ and }18.5$ respectively, suggesting a systematic gradient. The test was significant $\mathit{({\chi^2}_{(2)}=9.92, p=0.007) }$ indicating that at least one group differed. We reject the null hypothesis.

Now that we have established the existence of a significance in perceived ranking amongst the three attractiveness groups, we conduct the Mann-Whitney tests to determine its origin. The Bonferroni procedure adjusts our critical p-value to $\mathit{0.05/3=0.0167}$. Below are the test results;

## Bonferroni-Mann-Whitney U test (very attractive vs average)
## ------------------------------------------------------------
## p-value: 0.2271
## Decision: Fail to reject the null hypothesis
## 
## Bonferroni-Mann-Whitney U test (very attractive vs very unattractive)
## ---------------------------------------------------------------------
## p-value: 0.0039
## Decision: Reject the null hypothesis
## 
## Bonferroni-Mann-Whitney U test (average vs very unattractive)
## -------------------------------------------------------------
## p-value: 0.0406
## Decision: Fail to reject the null hypothesis

The comparison between very attractive and average groups was $\mathit{non-significant } \mathit{(W = 20, p = 0.2271)}$. The comparison between very attractive and very unattractive groups was $\mathit{significant } \mathit{(W = 4, p = 0.0039)}$. The comparison between average and very unattractive groups was $\mathit{non-significant } \mathit{(W = 12, p = 0.0406)}$. The data therefore provides sufficient evidence that individuals perceived as very attractive were ranked significantly more effective at solving difficult problems than those perceived as very unattractive

THREE

Lubischew (1962) gives measurements of maximum head width in units of 0.01 mm for three species of Chaetocnema. Part of his data is given below. Use a Kruskal–Wallis test to see if there is a species difference in head widths

species 1	species 2	species 3
53	49	58
50	49	51
52	47	45
50	54	53
49	43	49
47	51	51
54	49	50
51	51	51
52	50
57	46
	49

\[{H}_{0}:\text{The distribution of maximum head widths is identical across all three \textit{Chaetocnema} species} \\ \\ {H}_{1}:\text{At least one species differs in head width distribution}\]

## Kruskal-Wallis Test
## ------------------------------------------------
## p-value: 0.1088
## Decision: Fail to reject the null hypothesis

A Kruskal-Wallis H test was used to assess species differences in maximum head width (units:0.01mm) across three species of Chaetocnema ${n}_{1}=10,{n}_{2}=11 \text{ and }{n}{3}=8$, with $\alpha =0.05.$ Median head widths were $51.5, 49.0, \text{ and } 51.0$ for species 1, 2, and 3 respectively.The test was $\mathit{non-significant } ({\chi^2}_{2}=4.44 p = 0.1088).$ We fail to reject the null hypothesis since the data does not provide sufficient evidence of a species difference in head widths at the 5% significance level.

5. Tests for Nominal scale data

ONE

A researcher wishes to determine if there is an association between the level of a teacher’s education and his/her job satisfaction. He surveyed 158 teachers. The frequencies of the corresponding results are displayed in Table 8.19

Table 8.19
	Bachelor’s degree	Master’s degree	Post-Master’s degree
satisfied	60	41	19
unsatisfied	10	13	15

First, use a ${\chi}^{2}$-test for independence with $\alpha = 0.05$ to determine if there is an association between level of education and job satisfaction. Then, determine the effect size for the association. Report your findings

\[{H}_{0}:\text{Level of education and job satisfaction are independent} \] \[{H}_{1}:\text{Level of education and job satisfaction are associated}\]

## Test of independence (Chi-Square approximation)
## ------------------------------------------------
## p-value: 0.0038
## Decision: Reject the null hypothesis

In conclusion, education level and job satisfaction are statistically associative at the 5% significance level.

TWO

A professor gave her class a 10-item survey to determine the students’ satisfaction with the course. Survey question responses were measured using a five-point Likert scale. The survey had a score range from +20 to −20. Table 8.20 shows the scores of the students in a class of 13 students who rated the professor

gender	score	satisfaction
male	+12	+
male	+6	+
male	-5	-
male	-10	-
male	+17	+
male	+4	+
female	-2	-
female	-13	-
female	+10	+
female	-8	-
female	-11	-
female	-4	-
female	-14	-

Use a Fisher exact test with $\alpha = 0.05$ to determine if there is an association between gender and course satisfaction of the professor’s class. Then, determine the effect size for the association. Report your findings.

\[{H}_{0}: \text{Gender and course satisfaction are independent}\] \[{H}_{0}: \text{Gender and course satisfaction are associated}\]

## Test of independence (Fisher's Exact Test)
## ------------------------------------------------
## p-value: 0.1026
## Decision: Fail to reject the null hypothesis

There exists insufficient evidence against the null hypothesis of independence between gender and course satisfaction at 5% significance level

THREE

In an English parliamentary electoral constituency a random sample of 400 voters are classified by age and political affiliation as follows

	30 or under	31-40	41-55	56 or over
Conservative	31	32	39	34
Liberal Democrat	16	19	25	31
Labour	36	27	58	52

Is there evidence of an association between political affiliation and age? It is generally (though not universally) accepted that the Conservative, Liberal Democrat and Labour parties represent an ordering of right, middle and left in the political spectrum

## Test of independence (Chi-Square approximation)
## ------------------------------------------------
## p-value: 0.4435
## Decision: Fail to reject the null hypothesis

Political affiliation and age are independent of each other at the 5% level

FOUR

Agresti (1984) quotes the following data on cross-classification of attitudes towards abortion and amounts of schooling based on the US General Social Survey, 1972. Test these data for evidence of association between attitudes and educational background

	Disapprove	Neutral	Approve
Less than high school	209	101	237
High school	151	126	426
More than high school	16	21	138

## Test of independence (Chi-Square approximation)
## ------------------------------------------------
## p-value: 0
## Decision: Reject the null hypothesis

At the 5% significance level, there exists sufficient evidence in favor of the claim of present association between subjects’ attitudes towards abortion and educational background

6. Variable comparison

ONE

a

A china manufacturer is investigating market response to seven designs of dinner set. The main markets are the British and American. To get some idea of preferences in the two markets a survey of 100 British and 100 American women is carried out and each woman is asked to rank the designs in order of preference from 1 for favorite to 7 for least acceptable. For each country the 100 rank scores for each design is totalled. The design with the lowest total is assigned rank 1, that with the next lowest total rank 2, and so on. Overall rankings for each country are

Design	British rank	American rank
A	1	3
B	2	4
C	3	1
D	4	5
E	5	2
F	6	7
G	7	6

Calculate the Spearman and Kendall correlation coefficients. Is there evidence of a positive association between orders of preference

\[\text{Spearman's rank correlation coefficeint}\\ {H}_{0}:{\rho}_{s}=0\\ {H}_{1}:{\rho}_{s}\neq0\\ \\ \text{Kendall's rank correlation coefficeint}\\ {H}_{0}:{\tau}_{s}=0\\ {H}_{1}:{\tau}_{s}\neq0\]

## Spearman's correlation
## --------------------------
## value: 0.5714
## p-value: 0.1
## Decision: Fail to reject the null hypothesis
## 
## ====================================================
## 
## Kendall's correlation
## --------------------------
## value: 0.4286
## p-value: 0.1194
## Decision: Fail to reject the null hypothesis

We fail to reject the null hypotheses for both rank coefficients. The data provides sufficient evidence in favor of the claim that the 7 dinner set designs have an insignificant difference in the order of preference from the British and American markets.

b

The manufacturer above later decides to assess preferences in the Canadian and Australian markets by a similar method. Calculate the Spearman and Kendall correlation and coefficients. Is there evidence of a positive association between orders of preference.

## Spearman's correlation
## --------------------------
## value: 0
## p-value: 0.5183
## Decision: Fail to reject the null hypothesis
## 
## ====================================================
## 
## Kendall's correlation
## --------------------------
## value: 0.0476
## p-value: 0.5
## Decision: Fail to reject the null hypothesis

We fail to reject the null hypotheses for both rank coefficients. The data provides sufficient evidence in favor of the claim that the 7 dinner set designs have an insignificant difference in the order of preference from the Canadian and Australian markets.

c

Perform an appropriate analysis of the ranked data for all four countries in Exercises 7.6 and 7.7 to assess the evidence for any overall concordance

UK-CAN

## Spearman's correlation
## --------------------------
## value: 0.3929
## p-value: 0.1978
## Decision: Fail to reject the null hypothesis
## 
## ====================================================
## 
## Kendall's correlation
## --------------------------
## value: 0.2381
## p-value: 0.281
## Decision: Fail to reject the null hypothesis

We fail to reject the null hypotheses for both rank coefficients. The data provides sufficient evidence in favor of the claim that the 7 dinner set designs have an insignificant difference in the order of preference from the British and Canadian markets.

UK-AUS

## Spearman's correlation
## --------------------------
## value: 0.6786
## p-value: 0.0548
## Decision: Fail to reject the null hypothesis
## 
## ====================================================
## 
## Kendall's correlation
## --------------------------
## value: 0.4286
## p-value: 0.1194
## Decision: Fail to reject the null hypothesis

We fail to reject the null hypotheses for both rank coefficients. The data provides sufficient evidence in favor of the claim that the 7 dinner set designs have an insignificant difference in the order of preference from the British and Australian markets.

USA-CAN

## Spearman's correlation
## --------------------------
## value: 0.8214
## p-value: 0.0171
## Decision: Reject the null hypothesis
## 
## ====================================================
## 
## Kendall's correlation
## --------------------------
## value: 0.619
## p-value: 0.0345
## Decision: Reject the null hypothesis

We reject the null hypotheses for both rank coefficients. The data provides sufficient evidence against the initial claim that the 7 dinner set designs have an insignificant difference in the order of preference from the American and Canadian markets.

USA-AUS

## Spearman's correlation
## --------------------------
## value: 0.0357
## p-value: 0.4817
## Decision: Fail to reject the null hypothesis
## 
## ====================================================
## 
## Kendall's correlation
## --------------------------
## value: 0.0476
## p-value: 0.5
## Decision: Fail to reject the null hypothesis

We fail to reject the null hypotheses for both rank coefficients. The data provides sufficient evidence in favor of the claim that the 7 dinner set designs have an insignificant difference in the order of preference from the British and Australian market.

TWO

In a pharmacological experiment involving $\beta$ -blocking agents, Sweeting (1982) recorded for a control group of dogs, cardiac oxygen consumption (MVO) and left ventricular pressure (LVP). Calculate the Kendall and Spearman correlation coefficients. Is there evidence of correlation

Dog	MVO	LVP
A	78	32
B	92	33
C	116	45
D	90	30
E	106	38
F	78	24
G	99	44

$\text{We test the relevant hypotheses:}$

\[\text{Spearman's rank correlation coefficeint}\\ {H}_{0}:{\rho}_{s}=0\\ {H}_{1}:{\rho}_{s}\neq0\\ \\ \text{Kendall's rank correlation coefficeint}\\ {H}_{0}:{\tau}_{s}=0\\ {H}_{1}:{\tau}_{s}\neq0\]

## Spearman's correlation
## --------------------------
## value: 0.9009
## p-value: 0.0056
## Decision: Reject the null hypothesis
## 
## ====================================================
## 
## Kendall's correlation
## --------------------------
## value: 0.7807
## p-value: 0.0151
## Decision: Reject the null hypothesis

For this pharmacological experiment involving 7 dogs (n=7) selected from the control group, we collect their data on cardiac oxygen consumption and left ventricular pressure. The paired observations from each of the 7 subjects is ranked for the purpose of the analysis. The Spearman rank correlation coeficient was significant $\mathit{({r}_{s} = 0.9009, p < 0.05)}.$ Kendall’s coeficient was also significant $\mathit{({\tau} = 0.7807, p < 0.05)}$.This data shows that there exists a significant monotonic association between cardiac oxygen consumption and left ventricular pressure.

THREE

Bardsley and Chambers (1984) gave numbers of beef cattle and sheep on 19 large farms in a region. Is there evidence of correlation

cattle	sheep
41	4716
0	4605
42	4951
15	2745
47	6592
0	8934
0	9165
0	5917
56	2618
67	1105
707	150
368	2005
231	3222
104	7150
132	8658
200	6304
172	1800
146	5270
0	1537

$\text{We test the relevant hypotheses:}$

\[\text{Spearman's rank correlation coefficeint}\\ {H}_{0}:{\rho}_{s}=0\\ {H}_{1}:{\rho}_{s}\neq0\\ \\ \text{Kendall's rank correlation coefficeint}\\ {H}_{0}:{\tau}_{s}=0\\ {H}_{1}:{\tau}_{s}\neq0\]

## Spearman's correlation
## --------------------------
## value: -0.331
## p-value: 0.1663
## Decision: Fail to reject the null hypothesis
## 
## ====================================================
## 
## Kendall's correlation
## --------------------------
## value: -0.235
## p-value: 0.168
## Decision: Fail to reject the null hypothesis

The number of beef cattle and sheep was collected from 19 randomly selected farms. These observations are ranked for subsequent analysis. The Spearman rank correlation coefficeint was insignificant $\mathit{({r}_{s} = -0.331, p > 0.05)}$ .The same applies for the Kendall’s coefficient $\mathit{({\tau} = -0.235, p > 0.05)}$. There’s an insignificant negative monotonic association between the number of cattle owned and sheep owned by the farms in that region.

7. References

$\text{- Conover, W. J. (1999). }\textit{Practical Nonparametric Statistics}\text{ (3rd ed.). John Wiley & Sons}$ $\text{- Corder, G. W., & Foreman, D. I. (2014). }\textit{Nonparametric Statistics: A Step-by-Step Approach}\text{ (2nd ed.). John Wiley & Sons.}$

Non-parametric Methods - Problems

Gift Pamba

May-June 2026

2. Comparing two unrelated samples

ONE

TWO

THREE

FOUR

FIVE

SIX

4. Comparing more than two unrelated samples

ONE

TWO

THREE

5. Tests for Nominal scale data

ONE

TWO

THREE

FOUR

6. Variable comparison

ONE

a

b

c

UK-CAN

UK-AUS

USA-CAN

USA-AUS

TWO

THREE

7. References

participant	week 1	week 2	week 3	week 4
1	7	6	8	9
2	9	8	10	1
3	15	18	16	17
4	7	6	8	9
5	7	8	10	11
6	10	14	13	11
7	12	19	11	13
8	7	4	2	5
9	8	7	9	5
10	12	16	14	15

participant	baseline	month 1	month 2
1	66	67	69
2	49	50	56
3	51	52	49
4	65	65	69
5	42	43	46
6	38	39	40
7	33	31	39
8	41	41	44
9	46	47	48
10	45	46	46
11	36	33	34
12	51	55	67

patient	drug A	drug B	drug C
1	170	7	0
2	19	1.4	6
3	187	205	18
4	10	0.3	1
5	216	0.2	22
6	49	33	30
7	7	37	3
8	474	9	5
9	0.4	0.6	0
10	1.4	63	36
11	27	145	26
12	29	0	0

participant	week 1	week 2	week 3	week 4
1	7	6	8	9
2	9	8	10	1
3	15	18	16	17
4	7	6	8	9
5	7	8	10	11
6	10	14	13	11
7	12	19	11	13
8	7	4	2	5
9	8	7	9	5
10	12	16	14	15

participant	baseline	month 1	month 2
1	66	67	69
2	49	50	56
3	51	52	49
4	65	65	69
5	42	43	46
6	38	39	40
7	33	31	39
8	41	41	44
9	46	47	48
10	45	46	46
11	36	33	34
12	51	55	67

patient	drug A	drug B	drug C
1	170	7	0
2	19	1.4	6
3	187	205	18
4	10	0.3	1
5	216	0.2	22
6	49	33	30
7	7	37	3
8	474	9	5
9	0.4	0.6	0
10	1.4	63	36
11	27	145	26
12	29	0	0

participant	week 1	week 2	week 3	week 4
1	7	6	8	9
2	9	8	10	1
3	15	18	16	17
4	7	6	8	9
5	7	8	10	11
6	10	14	13	11
7	12	19	11	13
8	7	4	2	5
9	8	7	9	5
10	12	16	14	15

participant	baseline	month 1	month 2
1	66	67	69
2	49	50	56
3	51	52	49
4	65	65	69
5	42	43	46
6	38	39	40
7	33	31	39
8	41	41	44
9	46	47	48
10	45	46	46
11	36	33	34
12	51	55	67

patient	drug A	drug B	drug C
1	170	7	0
2	19	1.4	6
3	187	205	18
4	10	0.3	1
5	216	0.2	22
6	49	33	30
7	7	37	3
8	474	9	5
9	0.4	0.6	0
10	1.4	63	36
11	27	145	26
12	29	0	0