1. Comparing two related samples
ONE
A teacher wished to determine if providing a bilingual dictionary to
students with limited English proficiency improves math test scores. A
small class of students (n = 10) was selected. Students
were given two math tests. Each test covered the same type of math
content; however, students were provided a bilingual dictionary on the
second test. The data in Table 3.10 represent the students’
performance on each math test.
Table 3.10
| 1 |
30 |
39 |
| 2 |
56 |
46 |
| 3 |
48 |
37 |
| 4 |
47 |
44 |
| 5 |
43 |
32 |
| 6 |
45 |
39 |
| 7 |
36 |
41 |
| 8 |
44 |
40 |
| 9 |
44 |
38 |
| 10 |
40 |
46 |
Use a one-tailed Wilcoxon signed rank test and a
one-tailed sign test to determine which testing
condition resulted in higher scores. Use \(\alpha = 0.05\). Report your findings
\[{H}_{0}:\text{The billingual dictionary
has no effect on math test scores}\\ \\ {H}_{1}:\text{Math test scores
are higher without the billingual dictionary}\]
## One-tailed Wilcoxon signed-rank test
## ---------------------------------------
## p-value: 0.1099
## Decision: Fail to reject the null hypothesis
##
## One-tailed Sign Test
## --------------------------
## p-value: 0.1719
## Decision: Fail to reject the null hypothesis
A one-tailed Wilcoxon signed-rank test and a
one-tailed sign test were conducted to determine
whether students performed better on the math test without a bilingual
dictionary, with \(\alpha = 0.05\) and
\(n = 10\). Median scores were \(44.0 \text{ (without dictionary) } \text{ and }
39.5 \text{ (with dictionary) }\). For the Wilcoxon signed-rank
test, the result was non-significant \((V = 40, p = 0.1099)\). The sign test
yielded 7 positive differences and 3 negative
differences, also non-significant \((p = 0.1719)\). Both tests fail to reject
\({H}_{0}\) — there is insufficient
evidence to conclude that students scored higher without the bilingual
dictionary.
TWO
A research study was done to investigate the influence of being alone
at night on the human male heart rate.Two mean were sent into a wooded
area, one at a time, at night, for 20 minutes. They had a heart monitor
to record their pulse rate. The second night, the same men were sent
into a similar wooded area accompanied by a companion. Thei pulse rate
was recorded again. The researcher wanted to see if having a companion
would change their pulse rate. The median pulse rates are recorded in
Table 3.11. Use a two-tailed Wilcoxon signed-rank
test and a two-tailed signed test to determine
which condition produced a higher pulse rate. Use \(\alpha = 0.05\). Report your findings
Table 3.11
| A |
88 |
72 |
| B |
77 |
74 |
| C |
91 |
80 |
| D |
70 |
77 |
| E |
80 |
71 |
| F |
85 |
83 |
| G |
90 |
80 |
| H |
82 |
91 |
| I |
93 |
86 |
| J |
75 |
69 |
\[{H}_{0}:\text{There is no diference in
pulse rate between being alone and having a companion}\\ \\
{H}_{1}:\text{There is a difference in pulse rate between the two
conditions}\]
## Two-tailed Wilcoxon signed-rank test
## ---------------------------------------
## p-value: 0.1025
## Decision: Fail to reject the null hypothesis
##
## Two-tailed Sign Test
## --------------------------
## p-value: 0.1094
## Decision: Fail to reject the null hypothesis
Two-tailed Wilcoxon signed-rank and sign
tests were conducted to assess whether having a companion
affected heart rate among \(n = 10\)
male participants, with \(\alpha =
0.05\). Median pulse rates were \(83.5
\text{ bpm (alone) and }78.5 \text{ bpm (with companion) }\). The
Wilcoxon signed-rank test was
non-significant \((V = 44, p
= 0.1025)\). The sign test recorded 8
positive and 2 negative differences, also
non-significant \((p =
0.1094)\). Both tests fail to reject \({H}_{0}\) — the data provides insufficient
evidence that suggests a difference in pulse rate between the two
conditions.
THREE
A researcher conducts a pilot study to compare two treatments to help
obese female teenagers lose weight. She tests each individual in two
different treatment conditions. The data in Table 3.12
provides the number of pounds that each participant lost.
Table 3.12
| participant |
treatment 1 |
treatment 2 |
| 1 |
10 |
18 |
| 2 |
20 |
12 |
| 3 |
15 |
16 |
| 4 |
9 |
7 |
| 5 |
18 |
21 |
| 6 |
11 |
17 |
| 7 |
6 |
13 |
| 8 |
12 |
14 |
Use a two-tailed Wilcoxon signed-rank test and a
two-tailed sign test to determine which treatment
resulted in greater weight loss. Use \(\alpha
= 0.05\). Report your findings
\[{H}_{0}:\text{There is no difference in
weight loss between treatment 1 and 2}\\ \\ {H}_{1}:\text{There is a
difference in weight loss between the two treatments}\]
## Two-tailed Wilcoxon signed-rank test
## ---------------------------------------
## p-value: 0.2924
## Decision: Fail to reject the null hypothesis
##
## Two-tailed Sign Test
## --------------------------
## p-value: 0.2891
## Decision: Fail to reject the null hypothesis
Two-tailed Wilcoxon signed-rank and sign
tests were applied to compare weight loss outcomes between two
treatments among \(n = 8\) female
participants, with \(\alpha = 0.05\).
Median weight loss was \(11.5 \text{ lbs under
treatment 1 and }15.0 \text{ lbs under treatment 2 }\). The
Wilcoxon signed-rank test was
non-significant \((V = 10, p
= 0.2924)\). The sign test yielded only 2
positive differences and 6 negative
differences, also non-significant \((p = 0.2891)\). Both tests fail to reject
\({H}_{0}\)- there is no statistically
significant difference in weight loss between the two treatments,
although the descriptive pattern leans in favour of treatment 2.
FOUR
Twenty participants in an exercise program were measured on the
number of sit-ups they could do before physical exercise (first count)
and the number they could do after they had done at least 45 minutes of
other physical exercise.(second count) Table 3.13 shows the
results for 20 participants obtained during two separate physical
exercise sessions. Determine the Effect Size for a
calculated z-score
## Effect size: 0.4591
This constitutes a medium-to-large effect,
indicating that the exercise session had a practically meaningful impact
on sit-up performance.
FIVE
A school is trying to get more students to participate in activities
that will make learning more desirable. Table 3.14 shows
the number of activities that each of the 10 students in one class
participated in last year before a new activity program was implemented
and this year after it was implemented. Construct a 95% confidence
interval based on the Wilcoxon signed-rank test to
determine whether the new activity program had a significant positive
effect on the student participation
\[{H}_{0}:\text{The new activity program
has no effect on student participation} \\ \\ {H}_{1}:\text{The new
activity program has a significant positive effect on student
participation}\]
## [-6.9999,1]
Since the above interval contains zero, we fail to reject \({H}_{0}\)— there is insufficient evidence
at \(\alpha = 0.05\) to conclude that
the new activity program had a significant positive effect on student
participation.
SIX
Samples of cream from each of ten dairies (A to J) are each divided
into two portions. One portion from each is sent to laboratory I, the
other to Laboratory II, for bacterium counts. The counts (thousands
bacteria \({ml}^{-1}\)) are:
| A |
11.7 |
10.9 |
| B |
12.1 |
11.9 |
| C |
13.3 |
13.4 |
| D |
15.1 |
15.4 |
| E |
15.9 |
14.8 |
| F |
15.3 |
14.8 |
| G |
11.9 |
12.3 |
| H |
16.2 |
15.0 |
| I |
15.1 |
14.2 |
| J |
13.6 |
13.1 |
Use the Wilcoxon signed-rank test to assess the
evidence for any consistent difference between laboratories for
subsamples from the same dairy. Obtain also 95 and 99 percent confidence
intervals for the mean and compare these with the intervals using the
optimal method when normality is assumed
\[{H}_{0}:\text{There is no consisitent
difference in bacteria counts between laboratory I and laboratory II}\\
\\ {H}_{1}:\text{There is a consistent difference in bacteria counts
between the two laboratories}\]
## Two-tailed Wilcoxon signed-rank test
## ------------------------------------------------
## p-value: 0.0526
## Decision: Fail to reject the null hypothesis
A two-tailed Wilcoxon signed-rank test was conducted
to assess whether a consistent difference existed between bacteria
counts \((\text{thousands }{ml}^{-1})\)
recorded by two laboratories across \(n =
10\) dairy subsamples, with \(\alpha =
0.05\) . Median counts were \(14.35
\text{ (Lab I) and }13.80\text{ (Lab II) }\). The test was
marginally non-significant \((V = 47, p = 0.0526)\). We fail to reject
\({H}_{0}\)- the evidence for a
consistent laboratory difference falls just short of the \(5\%\) threshold. The
Hodges-Lehmann estimate of the location shift was \(0.450\text{ thousand bacteria }
{ml}^{−1}\)
The nonparametric 95% confidence interval for the difference was
\([-0.05,0.8999]\) and the 99%
confidence interval was \([-0.35,1.15]\), both straddling zero and
consistent with the test decision
SEVEN
For each of nine matched pairs of students, one student is allocated
to a series of lectures and the other to appropriate computer assisted
learning (CAL) material. At the end of the course, the students are
given the same examination paper. The marks achieved (out of 100)
are:
| 1 |
50 |
25 |
| 2 |
56 |
58 |
| 3 |
51 |
65 |
| 4 |
46 |
38 |
| 5 |
88 |
91 |
| 6 |
79 |
32 |
| 7 |
81 |
31 |
| 8 |
95 |
13 |
| 9 |
73 |
49 |
\[{H}_{0}:\text{There is no difference in
examination performance between CAL and lecture-based learning}\\ \\
{H}_{1}:\text{There is a difference in examination performance between
the two methods}\]
Analyze these results by what you consider the most appropriate
parametric or nonparametric methods to determine whether or not they
provide acceptable evidence that CAL materials lead to better
examinations results
## Two-tailed Wilcoxon signed-rank test
## ------------------------------------------------
## p-value: 0.0742
## Decision: Fail to reject the null hypothesis
A two-tailed Wilcoxon signed-rank test was used to
compare examination marks (out of 100) between students assigned to
computer-assisted learning (CAL) and those assigned to lectures, across
\(n = 9\) matched pairs, with \(\alpha = 0.05\) . Median marks were \(73\text{ (CAL) and }38\text{ (lectures)
}\). Of the \(9\) pairs, \(6\) showed higher marks under CAL and \(3\) under lectures. Despite this
descriptive advantage, the test was non-significant
\((V = 38, p = 0.0742)\). We fail to
reject \({H}_{0}\)— there is
insufficient evidence at the \(5\%\)
level to conclude that CAL and lecture-based learning produce different
examination outcomes.
3. Comparing more than two related samples
ONE
A graduate student performed a pilot study for his dissertation. He
wanted to examine the effects of animal companionship on elderly males.
He selected 10 male participants from a nursing home. Then he used an
ABAB research design, where A represented a week with the absence of a
cat and B represented a week with the presence of a cat. At the end of
each week, he administered a 20-point survey to measure the quality of
life satisfaction. The survey results are presented in
Table 5.9
Table 5.9
| 1 |
7 |
6 |
8 |
9 |
| 2 |
9 |
8 |
10 |
1 |
| 3 |
15 |
18 |
16 |
17 |
| 4 |
7 |
6 |
8 |
9 |
| 5 |
7 |
8 |
10 |
11 |
| 6 |
10 |
14 |
13 |
11 |
| 7 |
12 |
19 |
11 |
13 |
| 8 |
7 |
4 |
2 |
5 |
| 9 |
8 |
7 |
9 |
5 |
| 10 |
12 |
16 |
14 |
15 |
Use a Friedman test to determine if one or more of
the groups are significantly different. Since this is a pilot study, use
\(\alpha = 0.10\). If a significant
difference exists, use Wilcoxon signed-rank tests to
identify which groups are significantly different. Use the
Bonferroni procedure to limit the type I error rate.
Report your findings
\[{H}_{0}:\text{The quality of life
satisfaction scores are identical across all four weeks} \\
{H}_{1}:\text{At least one week differs in quality of life
satisfaction}\]
## Friedman test
## ------------------------------------------------
## p-value: 0.5399
## Decision: Fail to reject the null hypothesis
A Friedman test was conducted to examine the effect of animal
companionship (ABAB design) on quality of life satisfaction among \(n=10\) elderly male participants across
four weeks, with \(\alpha = 0.05\).
Median scores were \(8.5, 8.0, 10.0, \text{
and } 10.0\) for weeks 1 through 4 respectively. The omnibus test
was non-significant \(({\chi^2}_{(3)}=2.16, p
= 0.5399)\) .We therefore fail to rejct \({H}_{0}.\) The data provides insufficient
evidence of a significant difference in quality of life satisfaction
across the four weeks. Since the Friedman test was non-significant,
post-hoc Wilcoxon signed-rank comparisons were not warranted, even at
\(\alpha = 0.10\)
TWO
A physical education teacher conducted an action research project to
examine a strength and conditioning program. Using 12 male participants,
she measures the number of curl ups they could do in 1 minute. She
measured their performance before the program. Then, she measured their
performance at 1 month intervals. Table 5.10 presents the
performance results.
Table 5.10: Number of curl ups in a minute
| 1 |
66 |
67 |
69 |
| 2 |
49 |
50 |
56 |
| 3 |
51 |
52 |
49 |
| 4 |
65 |
65 |
69 |
| 5 |
42 |
43 |
46 |
| 6 |
38 |
39 |
40 |
| 7 |
33 |
31 |
39 |
| 8 |
41 |
41 |
44 |
| 9 |
46 |
47 |
48 |
| 10 |
45 |
46 |
46 |
| 11 |
36 |
33 |
34 |
| 12 |
51 |
55 |
67 |
\[{H}_{0}:\text{The number of curl-ups is
identical across all three time points} \\ \\ {H}_{1}:\text{Performance
increases over time}\]
Use a Friedman test with \(\alpha = 0.05\) to determine if one or more
of the groups are significantly different. The teacher is expecting
performance gains, so if a significant difference exists, use
one-tailed Wilcoxon signed-rank tests to identify which
groups are significantly different. Use the Bonferroni
procedure to limit the type I error rate. Report your findings.
## Friedman test
## ------------------------------------------------
## p-value: 0.0041
## Decision: Reject the null hypothesis
A Friedman test was used to evaluate the effect of a strength and
conditioning program on curl-up performance across three time points
(baseline, month 1, month 2) for \(n =
12\) male participants, with \(\alpha =
0.05\). Median scores were \(45.5,
46.5, \text{ and } 47.0\) respectively, suggesting a progressive
increase. The Friedman test was significant \(({\chi^2}_{(2)}=10.9778, p = 0.0041)\)
indicating at least one time point differed.
We therefore proceed to conduct post-hoc Wilcoxon signed-rank tests
to establish the origin of the difference. Below are the post-hoc test
results determined via the Bonferroni p-value \(0.05/3=0.0167\);
## Bonferroni-Wilcoxon Signed Rank test (baseline vs month 1)
## ------------------------------------------------------------
## p-value: 0.1449
## Decision: Fail to reject the null hypothesis
##
## Bonferroni-Wilcoxon Signed Rank test (baseline vs month 2)
## -------------------------------------------------------------
## p-value: 0.0065
## Decision: Reject the null hypothesis
##
## Bonferroni-Wilcoxon Signed Rank test (month 1 vs month 2)
## -------------------------------------------------------------
## p-value: 0.009
## Decision: Reject the null hypothesis
The comparison of baseline vs month 1 was
non-significant \((W = 17, p
= 0.1449)\). However, baseline vs month 1 was
significant \((W = 7, p =
0.0065)\), as was month 1 vs month 2 \((W = 6, p = 0.009)\). The conditioning
program produced a significant improvement in curl-up performance by
month 2, with meaningful gains observed between month 1 and month 2 as
well, but no detectable change from baseline to month 1.
THREE
At the beginning of a session, 12 names are read out in random order
to 10 students. Four are names of prominent sporting personalities
(Group A), four of national and international politicians (Group B) and
four of local dignitaries (Group C). At the end of the session students
are asked to recall as many of the names as possible. The numbers
recalled were:
| I |
3 |
2 |
0 |
| II |
1 |
1 |
0 |
| III |
2 |
3 |
1 |
| IV |
4 |
3 |
2 |
| V |
3 |
2 |
2 |
| VI |
1 |
0 |
0 |
| VII |
3 |
2 |
4 |
| VIII |
3 |
2 |
1 |
| IX |
2 |
2 |
0 |
| X |
4 |
3 |
2 |
Rank the data within each block (student) and use a Friedman
test to assess evidence of a difference between recall rates
for the three groups. In particular, is the recall rate for group B
and/or group C significantly lower than that for group A? Carry out an
ANOVA on the given data. Do the conclusions agree with the Friedman
test? If not, why not?
\[{H}_{0}:\text{Recall rates are identical
across all three name categories}\\ \\ {H}_{1}:\text{At least one
category has a different recall rate}\]
## Friedman Test
## -----------------
## p-value: 0.0043
## Decision: Reject the null hypothesis
A Friedman test was performed to assess differences in recall rates
across three name categories — sporting personalities (A), politicians
(B), and local dignitaries (C) — among $ n=10 $ students, with \(\alpha = 0.05\). Median recall counts were
\(3, 2, \text{ and } 1\) for groups A,
B, and C respectively. We rejected the null hypothesis thus, test was
significant \(({\chi^2}_{(2)}=10.8888889, p =
0.0043)\).
One-way ANOVA
## Df Sum Sq Mean Sq F value Pr(>F)
## grp 2 9.867 4.933 9.380 0.00162 **
## student 9 24.533 2.726 5.183 0.00149 **
## Residuals 18 9.467 0.526
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A one-way ANOVA with student as a blocking factor also yielded a
significant group effect \((F=9.38, p =
0.002)\), consistent with the Friedman result. The two tests
agree here because the data, while discrete and bounded, does not show
severe departures from the ANOVA assumptions at this sample size.
Post-hoc Bonferroni Wilcoxon Signed Rank tests
## Bonferroni-Wilcoxon Signed Rank test (group A vs group B)
## ------------------------------------------------------------
## p-value: 0.0411
## Decision: Fail to reject the null hypothesis
##
## Bonferroni-Wilcoxon Signed Rank test (group A vs group C)
## -------------------------------------------------------------
## p-value: 0.0126
## Decision: Reject the null hypothesis
##
## Bonferroni-Wilcoxon Signed Rank test (group B vs group C)
## -------------------------------------------------------------
## p-value: 0.1142
## Decision: Fail to reject the null hypothesis
The comparison A vs B was non-significant
\((V = 31.5, p = 0.0411)\) falling just
above the adjusted threshold. The comparison A vs C was
significant \((V = 52, p =
0.0126)\), confirming that group A was recalled significantly
more than group C. The comparison B vs C was
non-significant \((V = 29.5,
p = 0.1142)\). Sporting personalities were recalled at a
significantly higher rate than local dignitaries, though no
statistically significant difference was established between
politicians and sporting personalities or between
politicians and local dignitaries after Bonferroni correction
FOUR
Four share tipsters are each asked to predict on 10 randomly selected
days whether the London FTSE Index (commonly known as Footsie) will rise
or fall on the following day. If they predict correctly this is scored
as 1, if incorrectly as 0. Do the scores below indicate differences in
tipsters’ ability to predict accurately?
| 1 |
1 |
1 |
1 |
1 |
| 2 |
0 |
1 |
1 |
1 |
| 3 |
0 |
1 |
0 |
0 |
| 4 |
1 |
1 |
1 |
0 |
| 5 |
1 |
0 |
1 |
0 |
| 6 |
1 |
1 |
1 |
1 |
| 7 |
1 |
1 |
1 |
1 |
| 8 |
0 |
0 |
1 |
1 |
| 9 |
1 |
0 |
0 |
0 |
| 10 |
1 |
0 |
1 |
1 |
\[{H}_{0}:\text{There is no difference in
prediction accuracy among the four tipsters} \\ \\ {H}_{1}:\text{At
least on tipster differs in in prediction theory}\]
## Cochran's Q test
## ------------------
## p-value: 0.6974
## Decision: Fail to reject the null hypothesis
Cochran’s Q test was applied to binary prediction outcomes (1 =
correct, 0 = incorrect) from four share tipsters over 10 days, with
\(\alpha = 0.05\) . Correct prediction
counts were \(7, 6, 8, \text{ and } 6\)
out of \(10\) for tipsters 1 through 4
respectively. The test was non-significant \(({Q}_{(3)} = 1.4347826,p = 0.6974\) .We
fail to reject \({H}_{0}\) — there is
no sufficient evidence of a difference in prediction accuracy among the
four tipsters
FIVE
Berry (1987) gives the following data for numbers of premature
ventricular contractions per hour for 12 patients with cardiac
arrhythmias when each is treated with 3 drugs A, B, C
| 1 |
170 |
7 |
0 |
| 2 |
19 |
1.4 |
6 |
| 3 |
187 |
205 |
18 |
| 4 |
10 |
0.3 |
1 |
| 5 |
216 |
0.2 |
22 |
| 6 |
49 |
33 |
30 |
| 7 |
7 |
37 |
3 |
| 8 |
474 |
9 |
5 |
| 9 |
0.4 |
0.6 |
0 |
| 10 |
1.4 |
63 |
36 |
| 11 |
27 |
145 |
26 |
| 12 |
29 |
0 |
0 |
Use a Friedman test to investigate differences in response between
drugs. In particular, is there evidence of a difference between drug A
and drug B?
\[{H}_{0}:\text{The distribution of
premature ventricular contractions is identical across drugs A,B and C}
\\ \\ {H}_{1}:\text{At least one drug produces a different contraction
rate}\]
## Friedman test
## ------------------
## p-value: 0.0179
## Decision: Reject the null hypothesis
A Friedman test was conducted to investigate differences in premature
ventricular contractions per hour among \(12\) cardiac arrhythmia patients treated
with three drugs (A, B, C), with \(\alpha =
0.05\). Median contraction counts were \(28, 8, \text{ and } 5.5\) for drugs A, B,
and C respectively. The omnibus test was significant
\(({\chi^2}_{(2)} = 0.0179, p =
0.018)\). We reject the null hypothesis, concluding that drug
treatment has a significant effect on the rate of ventricular
contraction per hour.
SIX
Cohen (1983) gives data for numbers of births in Israel for each day
in 1975. We give below data for numbers of births on each day in the
10th, 20th, 30th and 40th weeks of the year
| 10 |
Monday |
108 |
| 10 |
Tuesday |
106 |
| 10 |
Wednesday |
100 |
| 10 |
Thurdsay |
85 |
| 10 |
Friday |
85 |
| 10 |
Saturday |
92 |
| 10 |
Sunday |
96 |
| 20 |
Monday |
82 |
| 20 |
Tuesday |
99 |
| 20 |
Wednesday |
89 |
| 20 |
Thurdsay |
125 |
| 20 |
Friday |
74 |
| 20 |
Saturday |
85 |
| 20 |
Sunday |
100 |
| 30 |
Monday |
96 |
| 30 |
Tuesday |
101 |
| 30 |
Wednesday |
108 |
| 30 |
Thursday |
103 |
| 30 |
Friday |
108 |
| 30 |
Saturday |
96 |
| 30 |
Sunday |
110 |
| 40 |
Monday |
124 |
| 40 |
Tuesday |
106 |
| 40 |
Wednesday |
111 |
| 40 |
Thursday |
115 |
| 40 |
Friday |
99 |
| 40 |
Saturday |
96 |
| 40 |
Sunday |
111 |
Perform Friedman analyses to determine whether the data indicate a
difference in birth rate between days of the week that shows consistency
over the four selected weeks
\[{H}_{0}:\text{The distribution of daily
bith counts is consistent across the four selected weeks} \\ \\
{H}_{1}:\text{At least one week shows a different pattern of daily
births}\]
## Friedman test
## ------------------
## p-value: 0.02
## Decision: Reject the null hypothesis
A Friedman test was used to assess whether birth rates across days of
the week were consistent over four selected weeks (weeks 10, 20, 30, 40)
in Israel 1975, with \(\alpha = 0.05\).
The test was significant \(({\chi^2}_{(3)}=9.8382353, p = 0.02)\). We
reject \({H}_{0}\) .The data indicates
that the pattern of daily births was inconsistent across the four
selected weeks, suggesting week-to-week variation in the distribution of
births by day.
SEVEN
Snee (1985) gives data on average liver weights per bird for chicks
given three levels of growth promoter (none, low, high). Blocks
correspond to different bird houses. Use a Friedman test to see if there
is evidence of an effect of growth promoter
| 1 |
3.93 |
3.99 |
4.08 |
| 2 |
3.78 |
3.96 |
3.94 |
| 3 |
3.88 |
3.96 |
4.02 |
| 4 |
3.93 |
4.03 |
4.06 |
| 5 |
3.84 |
4.10 |
3.94 |
| 6 |
3.75 |
4.02 |
4.09 |
| 7 |
3.98 |
4.06 |
4.17 |
| 8 |
3.84 |
3.92 |
4.12 |
\[{H}_{0}:\text{Average liver weight is
identical across the three promoter dose levels}\\ \\ {H}_{1}:\text{At
least one dose level produces a different liver weight}\]
## Friedman test
## ------------------
## p-value: 0.0015
## Decision: Reject the null hypothesis
A Friedman test was performed to assess the effect of growth promoter
dose level (none, low, high) on average liver weight per chick across
\(n = 8\) bird houses, with \(\alpha = 0.05\). Median liver weights were
\(3.860, 4.005, \text{ and } 4.070\)
for none, low, and high doses respectively, suggesting a monotonic
increase. The Friedman test was highly significant \(({\chi^2}_{(2)}=13, p = 0.0015)\) .We
reject \({H}_{0}\)
Since dose levels are ordered, a Page test is appropriate. Try this
also
\[{H}_{0}:\text{Liver weights are
identically distriuted across dose levels}\\ \\ {H}_{1}:\text{Liver
weights increase monotonically with dose level}\]
## Jonckheere-Terpstra test
## ------------------------------
## p-value: 3e-04
## Decision: Reject the null hypothesis
Since the dose levels are ordered, a Jonckheere-Terpstra directional
analysis was also conducted. Pairwise results confirmed a significant
increase from none to low \((W = 59, p = 0.003)\) and from
none to high $(W = 62, p = 0.001). The
comparison low to high was
non-significant \((W = 43,p =
0.134)\), suggesting the most pronounced effect occurs between
the no-dose and low-dose conditions,
with diminishing gains at higher doses. Overall, the data provides
strong evidence of a dose-ordered increase in average liver weight
attributable to the growth promoter
5. Tests for Nominal scale data
ONE
A researcher wishes to determine if there is an association between
the level of a teacher’s education and his/her job satisfaction. He
surveyed 158 teachers. The frequencies of the corresponding results are
displayed in Table 8.19
Table 8.19
| satisfied |
60 |
41 |
19 |
| unsatisfied |
10 |
13 |
15 |
First, use a \({\chi}^{2}\)-test for
independence with \(\alpha = 0.05\) to
determine if there is an association between level of education and job
satisfaction. Then, determine the effect size for the association.
Report your findings
\[{H}_{0}:\text{Level of education and job
satisfaction are independent}
\] \[{H}_{1}:\text{Level of education
and job satisfaction are associated}\]
## Test of independence (Chi-Square approximation)
## ------------------------------------------------
## p-value: 0.0038
## Decision: Reject the null hypothesis
In conclusion, education level and job satisfaction are statistically
associative at the 5% significance level.
TWO
A professor gave her class a 10-item survey to determine the
students’ satisfaction with the course. Survey question responses were
measured using a five-point Likert scale. The survey had a score range
from +20 to −20. Table 8.20 shows the scores of the students in a class
of 13 students who rated the professor
| male |
+12 |
+ |
| male |
+6 |
+ |
| male |
-5 |
- |
| male |
-10 |
- |
| male |
+17 |
+ |
| male |
+4 |
+ |
| female |
-2 |
- |
| female |
-13 |
- |
| female |
+10 |
+ |
| female |
-8 |
- |
| female |
-11 |
- |
| female |
-4 |
- |
| female |
-14 |
- |
Use a Fisher exact test with \(\alpha
= 0.05\) to determine if there is an association between gender
and course satisfaction of the professor’s class. Then, determine the
effect size for the association. Report your findings.
\[{H}_{0}: \text{Gender and course
satisfaction are independent}\] \[{H}_{0}: \text{Gender and course satisfaction are
associated}\]
## Test of independence (Fisher's Exact Test)
## ------------------------------------------------
## p-value: 0.1026
## Decision: Fail to reject the null hypothesis
There exists insufficient evidence against the null hypothesis of
independence between gender and course satisfaction at 5% significance
level
THREE
In an English parliamentary electoral constituency a random sample of
400 voters are classified by age and political affiliation as
follows
|
30 or under |
31-40 |
41-55 |
56 or over |
| Conservative |
31 |
32 |
39 |
34 |
| Liberal Democrat |
16 |
19 |
25 |
31 |
| Labour |
36 |
27 |
58 |
52 |
Is there evidence of an association between political affiliation and
age? It is generally (though not universally) accepted that the
Conservative, Liberal Democrat and Labour parties represent an ordering
of right, middle and left in the political spectrum
## Test of independence (Chi-Square approximation)
## ------------------------------------------------
## p-value: 0.4435
## Decision: Fail to reject the null hypothesis
Political affiliation and age are independent of each other at the 5%
level
FOUR
Agresti (1984) quotes the following data on cross-classification of
attitudes towards abortion and amounts of schooling based on the US
General Social Survey, 1972. Test these data for evidence of association
between attitudes and educational background
| Less than high school |
209 |
101 |
237 |
| High school |
151 |
126 |
426 |
| More than high school |
16 |
21 |
138 |
## Test of independence (Chi-Square approximation)
## ------------------------------------------------
## p-value: 0
## Decision: Reject the null hypothesis
At the 5% significance level, there exists sufficient evidence in
favor of the claim of present association between subjects’ attitudes
towards abortion and educational background
6. Variable comparison
ONE
a
A china manufacturer is investigating market response to seven
designs of dinner set. The main markets are the British and
American. To get some idea of preferences in the two markets a
survey of 100 British and 100 American women is carried out and each
woman is asked to rank the designs in order of preference from 1 for
favorite to 7 for least acceptable. For each country the 100 rank scores
for each design is totalled. The design with the lowest total is
assigned rank 1, that with the next lowest total rank 2, and so on.
Overall rankings for each country are
| A |
1 |
3 |
| B |
2 |
4 |
| C |
3 |
1 |
| D |
4 |
5 |
| E |
5 |
2 |
| F |
6 |
7 |
| G |
7 |
6 |
Calculate the Spearman and Kendall correlation coefficients. Is there
evidence of a positive association between orders of preference
\[\text{Spearman's rank correlation
coefficeint}\\
{H}_{0}:{\rho}_{s}=0\\
{H}_{1}:{\rho}_{s}\neq0\\ \\
\text{Kendall's rank correlation coefficeint}\\
{H}_{0}:{\tau}_{s}=0\\
{H}_{1}:{\tau}_{s}\neq0\]
## Spearman's correlation
## --------------------------
## value: 0.5714
## p-value: 0.1
## Decision: Fail to reject the null hypothesis
##
## ====================================================
##
## Kendall's correlation
## --------------------------
## value: 0.4286
## p-value: 0.1194
## Decision: Fail to reject the null hypothesis
We fail to reject the null hypotheses for both rank coefficients. The
data provides sufficient evidence in favor of the claim that the 7
dinner set designs have an insignificant difference in the order of
preference from the British and American markets.
b
The manufacturer above later decides to assess preferences in the
Canadian and Australian markets by a similar method.
Calculate the Spearman and Kendall correlation and coefficients. Is
there evidence of a positive association between orders of
preference.
## Spearman's correlation
## --------------------------
## value: 0
## p-value: 0.5183
## Decision: Fail to reject the null hypothesis
##
## ====================================================
##
## Kendall's correlation
## --------------------------
## value: 0.0476
## p-value: 0.5
## Decision: Fail to reject the null hypothesis
We fail to reject the null hypotheses for both rank coefficients. The
data provides sufficient evidence in favor of the claim that the 7
dinner set designs have an insignificant difference in the order of
preference from the Canadian and Australian
markets.
c
Perform an appropriate analysis of the ranked data for all four
countries in Exercises 7.6 and 7.7 to assess the evidence for any
overall concordance
UK-CAN
## Spearman's correlation
## --------------------------
## value: 0.3929
## p-value: 0.1978
## Decision: Fail to reject the null hypothesis
##
## ====================================================
##
## Kendall's correlation
## --------------------------
## value: 0.2381
## p-value: 0.281
## Decision: Fail to reject the null hypothesis
We fail to reject the null hypotheses for both rank coefficients. The
data provides sufficient evidence in favor of the claim that the 7
dinner set designs have an insignificant difference in the order of
preference from the British and Canadian markets.
UK-AUS
## Spearman's correlation
## --------------------------
## value: 0.6786
## p-value: 0.0548
## Decision: Fail to reject the null hypothesis
##
## ====================================================
##
## Kendall's correlation
## --------------------------
## value: 0.4286
## p-value: 0.1194
## Decision: Fail to reject the null hypothesis
We fail to reject the null hypotheses for both rank coefficients. The
data provides sufficient evidence in favor of the claim that the 7
dinner set designs have an insignificant difference in the order of
preference from the British and Australian
markets.
USA-CAN
## Spearman's correlation
## --------------------------
## value: 0.8214
## p-value: 0.0171
## Decision: Reject the null hypothesis
##
## ====================================================
##
## Kendall's correlation
## --------------------------
## value: 0.619
## p-value: 0.0345
## Decision: Reject the null hypothesis
We reject the null hypotheses for both rank coefficients. The data
provides sufficient evidence against the initial claim that the 7 dinner
set designs have an insignificant difference in the order of preference
from the American and Canadian markets.
USA-AUS
## Spearman's correlation
## --------------------------
## value: 0.0357
## p-value: 0.4817
## Decision: Fail to reject the null hypothesis
##
## ====================================================
##
## Kendall's correlation
## --------------------------
## value: 0.0476
## p-value: 0.5
## Decision: Fail to reject the null hypothesis
We fail to reject the null hypotheses for both rank coefficients. The
data provides sufficient evidence in favor of the claim that the 7
dinner set designs have an insignificant difference in the order of
preference from the British and Australian market.
TWO
In a pharmacological experiment involving \(\beta\) -blocking agents, Sweeting (1982)
recorded for a control group of dogs, cardiac oxygen consumption (MVO)
and left ventricular pressure (LVP). Calculate the Kendall and Spearman
correlation coefficients. Is there evidence of correlation
| A |
78 |
32 |
| B |
92 |
33 |
| C |
116 |
45 |
| D |
90 |
30 |
| E |
106 |
38 |
| F |
78 |
24 |
| G |
99 |
44 |
\(\text{We test the relevant
hypotheses:}\)
\[\text{Spearman's rank correlation
coefficeint}\\
{H}_{0}:{\rho}_{s}=0\\
{H}_{1}:{\rho}_{s}\neq0\\ \\
\text{Kendall's rank correlation coefficeint}\\
{H}_{0}:{\tau}_{s}=0\\
{H}_{1}:{\tau}_{s}\neq0\]
## Spearman's correlation
## --------------------------
## value: 0.9009
## p-value: 0.0056
## Decision: Reject the null hypothesis
##
## ====================================================
##
## Kendall's correlation
## --------------------------
## value: 0.7807
## p-value: 0.0151
## Decision: Reject the null hypothesis
For this pharmacological experiment involving 7 dogs (n=7) selected
from the control group, we collect their data on cardiac oxygen
consumption and left ventricular pressure. The
paired observations from each of the 7 subjects is ranked for the
purpose of the analysis. The Spearman rank correlation
coeficient was significant \(\mathit{({r}_{s} = 0.9009, p < 0.05)}.\)
Kendall’s coeficient was also significant \(\mathit{({\tau} = 0.7807, p <
0.05)}\).This data shows that there exists a significant
monotonic association between cardiac oxygen consumption and left
ventricular pressure.
THREE
Bardsley and Chambers (1984) gave numbers of beef cattle and sheep on
19 large farms in a region. Is there evidence of correlation
| 41 |
4716 |
| 0 |
4605 |
| 42 |
4951 |
| 15 |
2745 |
| 47 |
6592 |
| 0 |
8934 |
| 0 |
9165 |
| 0 |
5917 |
| 56 |
2618 |
| 67 |
1105 |
| 707 |
150 |
| 368 |
2005 |
| 231 |
3222 |
| 104 |
7150 |
| 132 |
8658 |
| 200 |
6304 |
| 172 |
1800 |
| 146 |
5270 |
| 0 |
1537 |
\(\text{We test the relevant
hypotheses:}\)
\[\text{Spearman's rank correlation
coefficeint}\\
{H}_{0}:{\rho}_{s}=0\\
{H}_{1}:{\rho}_{s}\neq0\\ \\
\text{Kendall's rank correlation coefficeint}\\
{H}_{0}:{\tau}_{s}=0\\
{H}_{1}:{\tau}_{s}\neq0\]
## Spearman's correlation
## --------------------------
## value: -0.331
## p-value: 0.1663
## Decision: Fail to reject the null hypothesis
##
## ====================================================
##
## Kendall's correlation
## --------------------------
## value: -0.235
## p-value: 0.168
## Decision: Fail to reject the null hypothesis
The number of beef cattle and sheep was collected from 19 randomly
selected farms. These observations are ranked for subsequent analysis.
The Spearman rank correlation coefficeint was
insignificant \(\mathit{({r}_{s} = -0.331, p
> 0.05)}\) .The same applies for the Kendall’s
coefficient \(\mathit{({\tau} =
-0.235, p > 0.05)}\). There’s an insignificant negative
monotonic association between the number of cattle owned and sheep owned
by the farms in that region.
7. References
\(\text{- Conover, W. J. (1999).
}\textit{Practical Nonparametric Statistics}\text{ (3rd ed.). John Wiley
& Sons}\) \(\text{- Corder, G. W.,
& Foreman, D. I. (2014). }\textit{Nonparametric Statistics: A
Step-by-Step Approach}\text{ (2nd ed.). John Wiley &
Sons.}\)