A study was made of all 26 astronauts on the first eight space shuttle flights (Bungo et al., 1985). On voluntary basis 17 astronauts consumed large quantities of salt and fluid prior to landing as a countermeasure to space de-conditioning, while nine did not. The table below shows supine heart rates (beats/minute) before and after flights in the space shuttle. Please use the data in file ex5_1.sav
ANSWER: Before starting, let’s run some simple exploratory analysis to get a feel for the data
library(tidyverse)
library(haven)
library(modelsummary)
df <- read_sav("data/ex5_1.sav") |>
mutate(across(where(is.double), as.double)) |>
mutate(
counter = fct_recode(
factor(counter),
countermeasure = "0",
no_countermeasure = "1")
)
df |>
filter(counter == "countermeasure") |>
select(-counter) |>
datasummary_skim(title = "Summary with countermeasure")
Unique (#) | Missing (%) | Mean | SD | Min | Median | Max | ||
---|---|---|---|---|---|---|---|---|
pre | 13 | 0 | 56.9 | 7.3 | 48.0 | 55.0 | 71.0 | |
post | 14 | 0 | 63.8 | 8.9 | 47.0 | 65.0 | 77.0 | |
change | 16 | 0 | 6.9 | 10.7 | −10.0 | 6.0 | 29.0 |
df |> filter(counter == "no_countermeasure") |>
select(-counter) |>
datasummary_skim(title = "Summary without countermeasure")
Unique (#) | Missing (%) | Mean | SD | Min | Median | Max | ||
---|---|---|---|---|---|---|---|---|
pre | 6 | 0 | 57.2 | 8.4 | 52.0 | 54.0 | 78.0 | |
post | 7 | 0 | 74.7 | 13.0 | 61.0 | 77.0 | 103.0 | |
change | 8 | 0 | 17.4 | 10.1 | 0.0 | 24.0 | 27.0 |
The figures above can help us evaluate the distributions of our data, make an initial comparison between the two groups etc… We can now turn to the analysis itself. Let’s begin with the parametric approach.
library(broom)
# parametric approach
df_cm <- df |>
filter(counter == "countermeasure")
t.test(df_cm$change) |>
tidy()
## # A tibble: 1 × 8
## estimate statistic p.value parameter conf.low conf.high method alternative
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 6.88 2.65 0.0174 16 1.38 12.4 One Sampl… two.sided
For the parametric approach: \[ H0: \delta = 0 \\ H1: delta \ne 0 \]
Using a paired t-test (which is identical to a single sample t-test of the difference), we get a p-value= 0.017; 95% CI for mean difference is \([1.38; 12.4]\).
Conclusion: reject H0 that the mean change equals zero;
Let’s now do some non-parametric testing. Here our null hypothesis assumes that the pre and post measurement come from the same symmetric distribution; Let’s run the Wilcoxon signed rank test.
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 111 0.0261 Wilcoxon signed rank test two.sided
ANSWER: The p-value = 0.026; Reject the null hypothesis;
ANSWER: The first group seems rather normal, the second less so.
# non-parametric approach
wilcox.test(df$change ~ df$counter,
exact = FALSE,
correct = FALSE) |>
tidy()
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 36.5 0.0309 Wilcoxon rank sum test two.sided
Our p-value is 0.031, the same as SPSS asymptotic significance test.
SPSS also reports the result of an exact test, but R complains when we
try to set the exact
argument to TRUE. In any case, since P
< 0.05 we can reject the null hypothesis. There is a tendency of
larger changes in the group that did not take countermeasures.
ANSWER: It is incorrect to analyze multiple observations on the same individuals as if they were from different people. The assumption of independence is not satisfied. Information coming from the same source is treated as if it came from independent sources, which means that we are overestimating our degrees of freedom and therefore we are more likely to make errors of type I.
ANSWER: In clinical research it is highly undesirable to let subjects choose their own treatments. No clinical trial conducted in this way would have credibility. Ideally (from the research point of view) the astronauts should have been randomized to receive the dietary countermeasure, but this was not set up as a prospective study
The Table below shows concentration of antibody to Type III Group B Strepoccus (GBS) in 20 volunteers before and after immunization (Baker et al., 1980). See also data file ex5_2.sav.
Unique (#) | Missing (%) | Mean | SD | Min | Median | Max | ||
---|---|---|---|---|---|---|---|---|
subject | 20 | 0 | 10.5 | 5.9 | 1.0 | 10.5 | 20.0 | |
before | 9 | 0 | 0.7 | 0.4 | 0.4 | 0.6 | 2.0 | |
after | 12 | 0 | 1.9 | 3.0 | 0.4 | 0.8 | 12.2 | |
change | 11 | 0 | 1.2 | 2.9 | −0.1 | 0.1 | 11.6 |
ANSWER: H0: mean concentration before immunization equals the mean concentration after immunization in the population; H1: H0 not true
t=1.8; P-value > 0.05
Comment on this result. What
method would be more appropriate, and why?ANSWER: The paired t-test gives P = 0.08. However, the test assumes that the differences have reasonably Normal distribution, which is clearly not the case here. A log transformation does not solve the problem; the distribution is still skewed. The nonparametric approach is therefore recommended: the Wilcoxon signed rank sum test assumes a symmetric distribution of the differences, which is not the case. The Sign test can be used.
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 76 0.00412 Wilcoxon signed rank test with continuity corre… two.sided
ANSWER: As before, R complains when we try to run an exact test. The results of an assymptotic test yields a p-value of 0.00412. SPSS calculates a P-value = 0.006;
In both cases, we reject the null hypothesis that the data before immunization come from the same population as the data after immunization. After immunization, people appear to have higher concentrations than before.
40 Patients receiving chemotherapy as outpatients were randomized to receive either an active antiemetic treatment (n=20) or placebo (n=20) (Williams et al., 1989). The following table shows measurements (in mm) on a 100 mm linear analogue self-assessment scale for nausea. See also data file ex5_3.sav.
ANSWER: H0: mean nausea is the same for both groups in the population; H1: H0 not true
ANSWER: The data do not meet the distributional assumptions for parametric methods based on the t-distribution. The groups can be compared using the non-parametric Mann-Whitney test, which tests the null hypothesis that the data in both groups come from the same distribution.
Let’s run both tests, for fun:
## # A tibble: 1 × 10
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -30.2 17.0 47.1 -4.00 0.000328 33.7 -45.5 -14.8
## # ℹ 2 more variables: method <chr>, alternative <chr>
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 78.5 0.00104 Wilcoxon rank sum test with continuity correcti… two.sided
ANSWER: Both tests show that the result is significant, the non-parametric P = 0.001 (also for the SPSS output). We can reject our null hypothesis, the placebo group has a tendency to higher nausea scores
Patients with chronic renal failure undergoing haemodialysis were divided into groups with low or normal plasma heparin cofactor II (HCII) levels (Toulon et al., 1987). Five months later the acute effects of haemodialysis were studied by analyzing plasma samples taken before and after haemodialysis. As dialysis increases total protein concentration in plasma, the ratio of HC II to protein was calculated with results shown in the following table:
These data are also in the data file ex5_4.sav. The aim was to compare the change in both groups. The data were analyzed by separate paired Wilcoxon tests on thedata for each group, giving P-value < 0.01 for group 1 and P-value > 0.05 for group 2.
Unique (#) | Missing (%) | Mean | SD | Min | Median | Max | ||
---|---|---|---|---|---|---|---|---|
before | 11 | 0 | 1.0 | 0.3 | 0.7 | 1.0 | 1.4 | |
after | 12 | 0 | 1.1 | 0.3 | 0.7 | 1.0 | 1.5 | |
change | 12 | 0 | 0.1 | 0.1 | −0.1 | 0.1 | 0.2 |
Unique (#) | Missing (%) | Mean | SD | Min | Median | Max | ||
---|---|---|---|---|---|---|---|---|
before | 12 | 0 | 1.5 | 0.3 | 1.2 | 1.5 | 2.1 | |
after | 11 | 0 | 1.6 | 0.3 | 1.2 | 1.5 | 2.1 | |
change | 11 | 0 | 0.1 | 0.1 | −0.1 | 0.0 | 0.4 |
Let’s run the tests, shall we?
df_low <- df |> filter(group == "low")
df_normal <- df |> filter(group == "normal")
wilcox.test(df_low$change, exact = FALSE) |>
tidy() # p = 0.01347
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 71 0.0135 Wilcoxon signed rank test with continuity corre… two.sided
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 52.5 0.0908 Wilcoxon signed rank test with continuity corre… two.sided
ANSWER: The p-values associated with paired Wilcoxon tests of the data for groups 1 and 2 are 0.0135 and 0.091 (in SPSS 0.01 and 0.08). P-values are not so far apart. Moreover, the mean changes in the two groups are almost the same. The correct way to compare the groups is by testing directly the difference between changes in the two groups. The two sample t-test or the Mann-Whitney test.
## # A tibble: 1 × 4
## statistic p.value df df.residual
## <dbl> <dbl> <int> <int>
## 1 1.53 0.229 1 22
## # A tibble: 1 × 10
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.00667 0.0775 0.0708 0.156 0.878 22 -0.0820 0.0954
## # ℹ 2 more variables: method <chr>, alternative <chr>
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 85.5 0.453 Wilcoxon rank sum test with continuity correcti… two.sided
ANSWER: As before, H0: the mean change is in both groups the same, delta_1 - delta_2 = 0; H1: delta_1 - delta_2 != 0; From the Levene’s test it is known that we can not reject the hypothesis that the variances are equal (R reports p-value 0.23, SPSS reports p-value = 0.14), and since there is no indication that the data in both groups are not normally distributed, the two sample t-test can be used.
The t-test shows a p-value= 0.88 (and the same value shown in R and SPSS); so we do not reject the null hypothesis. There is no significant difference. Non-parametric Wilcox shows p-value of 0.453, which does not change our decision about not rejecting the null hypothesis.
A group of 20 patients in remission from Hodgkin’s disease and a group of 20 patients in remission from other diverse, disseminated malignancies (called the non-Hodgkin’s disease group) were compared with respect to the number of T4 cells per mm3 in their blood. In the Table below the numbers are given. See also data file ex5_5.sav.
## # A tibble: 1 × 10
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -301. 522. 823. -2.11 0.0436 28.5 -593. -9.29
## # ℹ 2 more variables: method <chr>, alternative <chr>
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 135 0.0810 Wilcoxon rank sum test with continuity correcti… two.sided
We choose the non-parametric approach, since the data are rather skewed. The Mann-Whitney test gives a P-value of 0.08. So there is no indication that both groups differ with respect to the T4-counts.
## # A tibble: 1 × 4
## statistic p.value df df.residual
## <dbl> <dbl> <int> <int>
## 1 1.02 0.320 1 38
## # A tibble: 1 × 10
## estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 -0.398 6.09 6.49 -1.88 0.0682 38 -0.828 0.0313
## # ℹ 2 more variables: method <chr>, alternative <chr>
## # A tibble: 1 × 4
## statistic p.value method alternative
## <dbl> <dbl> <chr> <chr>
## 1 135 0.0810 Wilcoxon rank sum test with continuity correcti… two.sided
ANSWER: After the transformation, the data are less skewed, and more importantly, it can be assumed that the variances are equal (based on the Levene’s test). The result of the two-sample t-test gives a P-value of 0.07; this is very close to the nonparametric results. The null hypothesis that the means T4 are in both groups the same can not be rejected.
## [1] 0.4368983 0.6714660 1.0319715
ANSWER: The difference between the logs is between -0.828 and 0.031, so that means that the ratio is between 0.44 and 1.03. The transformed data the mean difference of -0.398, and a 95% CI for the mean difference: [-0.828;0.032]. After backwards transformation the estimation of the ratio is exp(-0.398) = 0.67; For the 95% CI for the ratio of the geometric means in both groups this is: [0.44; 1.03].
Interpretation: we are for 95% confident that the ratio of both geometric means is within this interval.
Twenty-two patients undergoing cardiac bypass surgery were randomized to one of three ventilation groups: Group I received a 50% nitrous oxide and 50% oxygen mixture continuously for 24 hours; Group II received a 50% nitrous oxide and 50% oxygen mixture only during the operation; Group III received no nitrous oxide and 35-50% oxygen mixture continuously for 24 hours. In data file ex5_6.sav, the red cell folate levels for the three groups after 24 hours ventilation are given. The aim of this study was to compare the three groups and test whether they have the same red cell folate levels. (a) Make a scatter plot of the data (group on the horizontal line). Based on this plot, what are your first conclusions with respect to the mean and variances of the different groups?
ANSWER: The groups seem to be different with respect to their means, but also with respect to their variation (range has different length)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 15516 7758 3.711 0.0436 *
## Residuals 19 39716 2090
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANSWER: The ANOVA reveals significant results P= 0.044; this means that the null hypothesis of equal means in the three groups must be rejected. However, according to the test of homogeneity of variances, the hypotheses of equal variances must be rejected. A transformation must be considered.
ANSWER: The test of equal variances for the transformed data reveals that the assumption of equal variances can be made. Test results for the ANOVA: P-value = 0.049. This is on the boundary of significance. There is indication that at least one pair of groups have different means.
##
## Pairwise comparisons using t tests with pooled SD
##
## data: df$rcfl and df$group
##
## g_I g_II
## g_II 0.042 -
## g_III 0.464 1.000
##
## P value adjustment method: bonferroni
ANSWER: From the paired comparisons with Bonferroni-corrections it appeared that group1 and group 2 differ with respect to their means (but on the boundary, P-value = 0.047).
##
## Kruskal-Wallis rank sum test
##
## data: rcfl by group
## Kruskal-Wallis chi-squared = 4.1852, df = 2, p-value = 0.1234
ANSWER: The Kruskal-Wallis test gives P-value = 0.123: no significant differences between the three groups
ANSWER: In a paper, you should be very careful to give the conclusion that there are differences between the three groups. You might stay on the safe side, and publish the non-parametric results.