Non-Parametric HW

Author

Ingrid Gonzalez

Problem 1

In an investigation of visual scanning behavior of deaf children, measurements of eye-movement rate were taken on nine deaf and nine hearing children. Researchers wanted to know whether the distributions of eye-movement rates for deaf children and hearing children differ?

Deaf children: 2.75, 2.14, 3.23, 2.07, 2.49, 2.18, 3.16, 2.93, 2.2
Hearing children: 0.89, 1.43, 1.06, 1.01, 0.94, 1.79, 1.12, 2.01, 1.12

Code

df1 <- data.frame(rates = c(2.75, 2.14, 3.23, 2.07, 2.49, 2.18, 3.16, 2.93, 2.2,
                            0.89, 1.43, 1.06, 1.01, 0.94, 1.79, 1.12, 2.01, 1.12),
                  group = c(rep('Deaf', 9), rep('Hearing',9)))
df1

   rates   group
1   2.75    Deaf
2   2.14    Deaf
3   3.23    Deaf
4   2.07    Deaf
5   2.49    Deaf
6   2.18    Deaf
7   3.16    Deaf
8   2.93    Deaf
9   2.20    Deaf
10  0.89 Hearing
11  1.43 Hearing
12  1.06 Hearing
13  1.01 Hearing
14  0.94 Hearing
15  1.79 Hearing
16  1.12 Hearing
17  2.01 Hearing
18  1.12 Hearing

Code

df1 %>%
  group_by(group) %>%
  get_summary_stats(rates, show = c("mean", "sd", "median", "min", "max"))

# A tibble: 2 × 8
  group   variable     n  mean    sd median   min   max
  <chr>   <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1 Deaf    rates        9  2.57 0.458   2.49  2.07  3.23
2 Hearing rates        9  1.26 0.396   1.12  0.89  2.01

Code

ggboxplot(df1, x = "group", y = "rates",
          xlab = "Groups", ylab = "Rate", add = "jitter")

Method: Two-Sample Independent t-test

Assumptions of the method:
- Observations are continuous or ordinal
- Observations within each sample must be independent
- The two samples must be normally distributed
- The two samples must have equal variance
- Sample size should be large

Hypotheses:
H₀: µ₁ = µ₂
The distributions of eye-movement rates for deaf children and hearing children are the same.

H₁: µ₁ ≠ µ₂
The distribution of eye-movement rates for deaf children and hearing children are different.

Calculation:
t = 6.489
CI = (0.881, 1.736)

Code

## Student t-test
t.test(rates ~ group, data = df1, var.equal = TRUE)


    Two Sample t-test

data:  rates by group
t = 6.4893, df = 16, p-value = 7.467e-06
alternative hypothesis: true difference in means between group Deaf and group Hearing is not equal to 0
95 percent confidence interval:
 0.8813049 1.7364729
sample estimates:
   mean in group Deaf mean in group Hearing 
             2.572222              1.263333

Statistical Decision & Interpretation:
Reject H₀

There were a total of 18 participants in this study, 9 in the deaf group and 9 in the hearing group. The mean eye-movement rate for deaf children was 2.57 (Md = 2.49, SD = 0.458), whereas the mean eye-movement rate in hearing children was 1.26 (Md = 1.26, SD = 0.396). There were no outliers in the data, as assessed by inspection of a boxplot. Eye movement rates for each level of hearing status were normally distributed, as assessed by Shapiro-Wilk’s test (p > .05), and there was homogeneity of variances, as assessed by Levene’s test for equality of variances (p > 0.05). An independent two-sample t-test showed that the difference was statistically significant, t(16) = 6.49, p < 0.001. The distribution of eye-movement rate for deaf children are significantly different than for hearing children.

On average, deaf children have a higher rate of eye-movement than hearing children (2.57 vs. 1.26).

Method: Mann-Whitney U

Assumptions of the method:
- Observations are continuous or ordinal
- Observations within each sample must be independent

Calculation:
W = 81
CI = (0.95, 1.81)

Code

## Mann-Whitney U
wilcox.test(rates ~ group, data = df1, na.rm = TRUE, 
                   paired = FALSE, exact = FALSE, conf.int = TRUE)


    Wilcoxon rank sum test with continuity correction

data:  rates by group
W = 81, p-value = 0.0004095
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 0.9500044 1.8100131
sample estimates:
difference in location 
              1.240078

Statistical Decision & Interpretation:
Reject H₀

There were a total of 18 participants in this study, 9 in the deaf group and 9 in the hearing group. The mean eye-movement rate for deaf children was 2.57 (Ms = 2.49, SD = 0.458), whereas the mean eye-movement rate in hearing children was 1.26 (Md = 1.26, SD = 0.396). Mann-Whitney U showed that the difference was statistically significant, W = 81, p < 0.001. The distribution of eye-movement rate for deaf children are significantly different than for hearing children.

On average, deaf children have a higher rate of eye-movement than hearing children (2.57 vs. 1.26).

Method: Two-Sample Independent t-test

Observed Test Statistics:

Code

## Permutation
samp1 <- c(2.75, 2.14, 3.23, 2.07, 2.49, 2.18, 3.16, 2.93, 2.2)
samp2 <- c(0.89, 1.43, 1.06, 1.01, 0.94, 1.79, 1.12, 2.01, 1.12)

## t-test
R <- 9999
all <- c(samp1, samp2)
k <- 1:length(all)
reps <- numeric(R)
# calculate the observed test statistic value
ts <- t.test(samp1, samp2, var.equal = TRUE)$statistic # extracts TS value from t.test
ts # print the observed TS value

       t 
6.489299

p-value:

Code

for (i in 1:R) {
  # generate m indices for the first sample
  m <- sample(k, size = length(samp1), replace = FALSE)
  s1 <- all[m]
  s2 <- all[-m]
  # calculate ith permuted test statistic
  reps[i] <- t.test(s1, s2, var.equal = TRUE)$statistic
  }
pvalue <- mean(c(ts, reps) >= ts) # calculate the p-value
pvalue

[1] 1e-04

Statistical Decision & Interpretation:
Reject H₀

There were a total of 18 participants in this study, 9 in the deaf group and 9 in the hearing group. The mean eye-movement rate for deaf children was 2.57 (SD = 0.458), whereas the mean eye-movement rate in hearing children was 1.26 (SD = 0.396). T-statistics was used for permutation and the exact p-value was similar to the independent two-sample t-test. Results showed that the difference was statistically significant p < 0.001 and we can conclude that the distribution of eye-movement rate for deaf children are significantly different than for hearing children.

On average, deaf children have a higher rate of eye-movement than hearing children (2.57 vs. 1.26).

Method: Mann-Whitney U

Observed Test Statistics:

Code

## Permutation
A <- c(2.75, 2.14, 3.23, 2.07, 2.49, 2.18, 3.16, 2.93, 2.2)
B <- c(0.89, 1.43, 1.06, 1.01, 0.94, 1.79, 1.12, 2.01, 1.12)

R <- 9999 # set number of sampled permutations
ranksall <- rank(c(A,B)) # pool both samples and jointly rank
k <- 1:length(ranksall) # create a pooled index set
reps <- numeric(R) # create storage for permuted TSs
ts <- sum(ranksall[1:length(A)]) # calculate the observed TS value
ts # print the observed TS value

[1] 126

p-value:

Code

for (i in 1:R) {
  # generate m indices for the first sample
  m <- sample(k, size = length(A), replace = FALSE)
  a1 <- ranksall[m] # assign ranks to permuted sample 1
  reps[i] <- sum(a1) # calculate ith permuted TS value
}
pvalue <- mean(c(ts, reps) >= ts) # calculate upper-tailed p-value
pvalue # print the p-value

[1] 1e-04

Statistical Decision & Interpretation:
Reject H₀

There were a total of 18 participants in this study, 9 in the deaf group and 9 in the hearing group. The mean eye-movement rate for deaf children was 2.57 (SD = 0.458), whereas the mean eye-movement rate in hearing children was 1.26 (SD = 0.396). Mann-Whitney summed ranks was used for permutation and the exact p-value was similar to the Mann-Whitney U Test. Results showed that the difference was statistically significant p < 0.001 and we can conclude that the distribution of eye-movement rate for deaf children are significantly different than for hearing children.

On average, deaf children have a higher rate of eye-movement than hearing children (2.57 vs. 1.26).

Problem 2

Eight subjects were asked to perform a simple puzzle-assembly task under normal conditions and under stressful conditions. During the stressful time a mild shock was delivered to subjects 3 minutes after the start of experiment and every 30 seconds thereafter until the task was completed. Blood pressure readings were taken under both conditions. Do the data present sufficient evidence to indicate higher blood pressure readings under stressful conditions?

Subject Normal Stressful
1 126 130
2 117 118
3 115 125
4 118 120
5 118 121
6 128 125
7 125 130
8 120 120

Code

df2_long <- data.frame(subject = c(rep(1:8, 2)),
                       bp = c(126, 117, 115, 118, 118, 128, 125, 120,
                              130, 118, 125, 120, 121, 125, 130, 120),
                       condition = c(rep('N',8), rep('S',8)))

df2_long

   subject  bp condition
1        1 126         N
2        2 117         N
3        3 115         N
4        4 118         N
5        5 118         N
6        6 128         N
7        7 125         N
8        8 120         N
9        1 130         S
10       2 118         S
11       3 125         S
12       4 120         S
13       5 121         S
14       6 125         S
15       7 130         S
16       8 120         S

Code

df2_long %>%
  group_by(condition) %>%
  get_summary_stats(bp, show = c("mean", "sd", "median", "min", "max"))

# A tibble: 2 × 8
  condition variable     n  mean    sd median   min   max
  <chr>     <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>
1 N         bp           8  121.  4.79    119   115   128
2 S         bp           8  124.  4.63    123   118   130

Code

boxplot2 <- 
  ggpaired(df2_long, x = "condition", y = "bp", 
           order = c("N", "S"),
           ylab = "Blood Pressure", xlab = "Condition")

boxplot2

Method: Paired t-test

Assumptions of the method:
- Observations are continuous or ordinal
- Paired observations must be from the same subject
- The two samples must be normally distributed
- The two samples must have equal variance
- Sample size should be large

Hypotheses:
H₀: µ₁ > µ₂
Blood pressure readings are higher in stressful conditions.

H₁: µ₁ ≠ µ₂
There are no difference in blood pressure readings between normal and stressful condition.

Calculation:
t = -2.023
CI = (-5.965, 0.465)

Code

## Paired t-test
t.test(bp ~ condition, data = df2_long, paired = TRUE)


    Paired t-test

data:  bp by condition
t = -2.0228, df = 7, p-value = 0.08279
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -5.9646841  0.4646841
sample estimates:
mean difference 
          -2.75

Statistical Decision & Interpretation:
Fail to reject H₀

There were a total of 8 participants in this study, and they were each asked to perform a simple puzzle task under normal and stressful conditions. Blood readings were recorded during both conditions. The mean blood pressure readings for the stressful condition were 124 (Md = 123,SD = 4.63), whereas the mean blood pressure readings for the normal condition was 121 (Md = 119, SD = 4.79). A paired t-test showed that the difference was not statistically significant, t(7) = -2.02, p > 0.05. There is no significant difference in the blood readings between the normal and stressful condition.

When the participants were under the normal condition, they had an average reading of 124 for their blood pressure, and had an average of 121 while under the normal condition. There were no difference in blood readings between the normal and stressful condition for the 8 participants while performing the assigned puzzle.

Method: Wilcoxon Signed Rank Test

Assumptions of the method:
- Observations are continuous or ordinal
- Paired observations must be from the same subject

Calculation:
V = 3.5
CI = (-7.50, 1.00)

Code

## Wilcoxon Rank Test
wilcox.test(bp ~ condition, data = df2_long, na.rm = TRUE, 
            paired = TRUE, exact = FALSE, conf.int = TRUE)


    Wilcoxon signed rank test with continuity correction

data:  bp by condition
V = 3.5, p-value = 0.09039
alternative hypothesis: true location shift is not equal to 0
95 percent confidence interval:
 -7.4999602  0.9999893
sample estimates:
(pseudo)median 
     -3.000058

Statistical Decision & Interpretation:
Fail to reject H₀

There were a total of 8 participants in this study, and they were each asked to perform a simple puzzle task under normal and stressful conditions. Blood readings were recorded during both conditions. The mean blood pressure readings for the stressful condition were 124 (Md = 123, SD = 4.63), whereas the mean blood pressure readings for the normal condition was 121 (Md = 121, SD = 4.79). A Wilcoxon Signed Rank test showed that the difference was not statistically significant, V = 3.5, p > 0.05.There is no significant difference in the blood readings between the normal and stressful condition.

Method: Paired t-test

Observed Test Statistics:

Code

## Paired t-test
samp1 <- c(126, 117, 115, 118, 118, 128, 125, 120)
samp2 <- c(130, 118, 125, 120, 121, 125, 130, 120)

## t-test
R <- 9999
all <- c(samp1, samp2)
k <- 1:length(all)
reps <- numeric(R)
# calculate the observed test statistic value
ts <- t.test(samp1, samp2, var.equal = TRUE, paired = TRUE)$statistic # extracts TS value from t.test
ts # print the observed TS value

        t 
-2.022817

p-value:

Code

for (i in 1:R) {
  # generate m indices for the first sample
  m <- sample(k, size = length(samp1), replace = FALSE)
  s1 <- all[m]
  s2 <- all[-m]
  # calculate ith permuted test statistic
  reps[i] <- t.test(s1, s2, var.equal = TRUE, paired = TRUE)$statistic
}
pvalue <- mean(c(ts, reps) >= ts) # calculate the p-value
pvalue

[1] 0.9574

Statistical Decision & Interpretation:
Fail to reject H₀

There were a total of 8 participants in this study, and they were each asked to perform a simple puzzle task under normal and stressful conditions. Blood readings were recorded during both conditions. The mean blood pressure readings for the stressful condition were 124 (SD = 4.63), whereas the mean blood pressure readings for the normal condition was 121 (SD = 4.79). Test statistics were used for permutation and the exact p-value was similar to the paired t-Test. Results showed that the difference were not statistically significant p > 0.05 and we can conclude that there is no significant difference in the blood readings between the normal and stressful condition.

Method: Wilcoxon Signed Rank Test

Observed Test Statistics:

Code

## Permutation
A <- c(126, 117, 115, 118, 118, 128, 125, 120)
B <- c(130, 118, 125, 120, 121, 125, 130, 120)

R <- 9999 # set number of sampled permutations
ranksall <- rank(c(A,B)) # pool both samples and jointly rank
k <- 1:length(ranksall) # create a pooled index set
reps <- numeric(R) # create storage for permuted TSs
ts <- sum(ranksall[1:length(A)]) # calculate the observed TS value
ts # print the observed TS value

[1] 56

p-value:

Code

for (i in 1:R) {
  # generate m indices for the first sample
  m <- sample(k, size = length(A), replace = FALSE)
  a1 <- ranksall[m] # assign ranks to permuted sample 1
  reps[i] <- sum(a1) # calculate ith permuted TS value
}
pvalue <- mean(c(ts, reps) >= ts) # calculate upper-tailed p-value
pvalue # print the p-value

[1] 0.8988

Statistical Decision & Interpretation:
Fail to reject H₀

There were a total of 8 participants in this study, and they were each asked to perform a simple puzzle task under normal and stressful conditions. Blood readings were recorded during both conditions. The mean blood pressure readings for the stressful condition were 124 (SD = 4.63), whereas the mean blood pressure readings for the normal condition was 121 (SD = 4.79). Test statistics were used for permutation and the exact p-value was similar to the Wilcoxon Signed rank test. Results showed that the difference were not statistically significant p > 0.05 and we can conclude that there is no significant difference in the blood readings between the normal and stressful condition.

Problem 3

LDL levels were compared for patients randomized into four treatment groups A, B, C and D. Do the data provide sufficient evidence to indicate that LDL levels are lower in some treatment groups as compared to others?

A: 124, 167, 135, 160, 159, 144, 133
B: 147, 121, 136, 114, 129, 117, 109
C: 141, 144, 139, 162, 155, 150
D: 117, 128, 102, 119, 128, 123

Code

df3 <- data.frame(levels = c(124, 167, 135, 160, 159, 144, 133,
                             147, 121, 136, 114, 129, 117, 109,
                             141, 144, 139, 162, 155, 150,
                             117, 128, 102, 119, 128, 123),
                  tmt = c(rep("A",7), rep("B",7), rep("C",6), rep("D",6)))
df3

   levels tmt
1     124   A
2     167   A
3     135   A
4     160   A
5     159   A
6     144   A
7     133   A
8     147   B
9     121   B
10    136   B
11    114   B
12    129   B
13    117   B
14    109   B
15    141   C
16    144   C
17    139   C
18    162   C
19    155   C
20    150   C
21    117   D
22    128   D
23    102   D
24    119   D
25    128   D
26    123   D

Code

df3 %>% 
  group_by(tmt) %>%
  summarise(
    count = n(),
    mean = mean(levels, na.rm = TRUE),
    sd = sd(levels, na.rm = TRUE),
    median = median(levels),
    min = min(levels),
    max = max(levels)
  )

# A tibble: 4 × 7
  tmt   count  mean    sd median   min   max
  <chr> <int> <dbl> <dbl>  <dbl> <dbl> <dbl>
1 A         7  146  16.2     144   124   167
2 B         7  125. 13.4     121   109   147
3 C         6  148.  8.87    147   139   162
4 D         6  120.  9.69    121   102   128

Code

ggboxplot(df3, x = "tmt", y = "levels",
          color = "tmt", palette = c("#00AFBB", "#E7B800", "#FC4E07", "#29DA35"),
          order = c("A", "B", "C", "D"),
          ylab = "LDL Levels", xlab = "Treatment")

Method: One-way ANOVA

Assumptions of the method:
- Observations are continuous or ordinal
- Observations within each sample must be independent
- The grouped samples must be normally distributed
- The grouped samples must have equal variance
- Sample size should be large

Hypotheses:
H₀: µ_A = µ_B = µ_C = µ_D
There are no differences in LDL levels among the treatment groups.

H₁: µ_A ≠ µ_B ≠ µ_C ≠ µ_D
At least one treatment group is different from another.

Calculation:
F = 8.574

Code

## Calculating one-way ANOVA
res.aov <- aov(levels ~ tmt, data = df3)
summary(res.aov)

            Df Sum Sq Mean Sq F value   Pr(>F)    
tmt          3   4121  1373.6   8.574 0.000587 ***
Residuals   22   3524   160.2                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Multiple Comparison:

Code

## Tukey
TukeyHSD(res.aov)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = levels ~ tmt, data = df3)

$tmt
          diff        lwr       upr     p adj
B-A -21.285714 -40.072357 -2.499071 0.0225277
C-A   2.500000 -17.053758 22.053758 0.9842484
D-A -26.500000 -46.053758 -6.946242 0.0054861
C-B  23.785714   4.231956 43.339472 0.0133830
D-B  -5.214286 -24.768044 14.339472 0.8796808
D-C -29.000000 -49.291893 -8.708107 0.0033822

Statistical Decision & Interpretation:
Reject H₀

There were a total of 26 participants in this study, and they were randomly assigned into four treat groups: A, B, C, and D. LDL levels were recorded for all four treatment groups. The mean LDL levels for treatment group A was 146 (Md = 144, SD = 16.2, n = 7), for treatment group B was 125 (Md = 121, SD = 13.4, n = 7), for treatment group C was 148 (Md = 147, SD = 8.87, n = 6), and treatment group D was 120 (Md = 121, SD = 9.69, n = 6). A one-way ANOVA test showed that the difference was statistically significant between the treatment groups, F = 8.574, p < 0.001. Tukey pairwise comparison showed there was a significant difference between treatment groups B and A, and B and C, as well between treatment groups between D and A, and D and C. (all p < 0.05).

The significant treatment group differences are between B-A, B-C, D-A, and D-C.Treatment group B had a mean LDL level difference of 21.3 greater than group A [CI = 2.5, 40.1], and 23.8 greater than group C [CI = 4.2, 43.3]. Treatment group D had a mean LDL level difference of 26.5 greater than group A [CI = 6.9, 46.1], and 29.0 greater than C [CI = 8.7, 49.3].

Method: Kruskal-Wallis

Assumptions of the method:
- Observations are continuous or ordinal
- Observations within each sample must be independent

Calculation:
K-W X² = 13.916

Code

## Kruskal-Wallis
kruskal.test(levels ~ tmt, data = df3)


    Kruskal-Wallis rank sum test

data:  levels by tmt
Kruskal-Wallis chi-squared = 13.916, df = 3, p-value = 0.003021

Multiple Comparison:

Code

## Dunn Test
dunnTest(levels ~ as.factor(tmt), data = df3, method="bh")

Dunn (1964) Kruskal-Wallis multiple comparison

  p-values adjusted with the Benjamini-Hochberg method.

  Comparison          Z     P.unadj      P.adj
1      A - B  2.2374887 0.025254416 0.03788162
2      A - C -0.4002715 0.688956592 0.68895659
3      B - C -2.5499811 0.010772877 0.02154575
4      A - D  2.6955343 0.007027584 0.02108275
5      B - D  0.5458247 0.585186438 0.70222373
6      C - D  2.9831931 0.002852580 0.01711548

Statistical Decision & Interpretation:
Reject H₀

There were a total of 26 participants in this study, and they were randomly assigned into four treat groups: A, B, C, and D. LDL levels were recorded for all four treatment groups. The mean LDL levels for treatment group A was 146 (Md = 144, SD = 16.2, n = 7), for treatment group B was 125 (Md = 121, SD = 13.4, n = 7), for treatment group C was 148 (Md = 147, SD = 8.87, n = 6), and treatment group D was 120 (Md = 121, SD = 9.69, n = 6). A Kruskal-Wallis test showed that the difference was statistically significant between the treatment groups, X² = 13.92, p < 0.005. Dunn test was used for pairwise comparison and it showed there was a significant difference between treatment groups B and A, and B and C, as well between treatment groups between D and A, and D and C. (all p < 0.05).

The significant treatment group differences are between B-A, B-C, D-A, and D-C.Treatment group B had a mean LDL level difference of 21.3 greater than group A, and 23.8 greater than group C. Treatment group D had a mean LDL level difference of 26.5 greater than group A, and 29.0 greater than C.

Method: One-way ANOVA

Observed Test Statistics:

Code

## One-way ANOVA
R <- 9999
reps <- numeric(R)
# calculate the observed F test statistic value
raw.results <- lm(levels ~ as.factor(tmt), data = df3)
Fobs <- summary(raw.results)$f[1]
Fobs

   value 
8.573977

p-value:

Code

for (i in 1:R) {
  shuffle <- data.frame(tmt = df3$tmt,
                        levels = sample(df3$levels,
                                        size = dim(df3)[1], 
                                        replace = FALSE))
  shuffled.results <- lm(levels ~ as.factor(tmt), data = shuffle)
  reps[i] <- summary(shuffled.results)$f[1] # extract the ith permuted F statistic
  }
pvalue <- mean(c(Fobs, reps) >= Fobs) # always an upper-tailed test
pvalue

[1] 6e-04

Statistical Decision & Interpretation:
Reject H₀

There were a total of 26 participants in this study, and they were randomly assigned into four treat groups: A, B, C, and D. LDL levels were recorded for all four treatment groups. The mean LDL levels for treatment group A was 146 (Md = 144, SD = 16.2, n = 7), for treatment group B was 125 (Md = 121, SD = 13.4, n = 7), for treatment group C was 148 (Md = 147, SD = 8.87, n = 6), and treatment group D was 120 (Md = 121, SD = 9.69, n = 6). A Test statistics were used for permutation and the exact p-value was similar to the one-way ANOVA test (p < 0.05).

Method: Kruskal-Wallis

Observed Test Statistics:

Code

## Permutation
coin::kruskal_test(levels ~ as.factor(tmt), data = df3)


    Asymptotic Kruskal-Wallis Test

data:  levels by as.factor(tmt) (A, B, C, D)
chi-squared = 13.916, df = 3, p-value = 0.003021

Statistical Decision & Interpretation:
Reject H₀

There were a total of 26 participants in this study, and they were randomly assigned into four treat groups: A, B, C, and D. LDL levels were recorded for all four treatment groups. The mean LDL levels for treatment group A was 146 (Md = 144, SD = 16.2, n = 7), for treatment group B was 125 (Md = 121, SD = 13.4, n = 7), for treatment group C was 148 (Md = 147, SD = 8.87, n = 6), and treatment group D was 120 (Md = 121, SD = 9.69, n = 6). A Test statistics were used for permutation and the exact p-value was similar to the Kruskal-Wallis test (p < 0.05).

Problem 4

Experiment was conducted to compare the effects of three toxic chemicals A, B and C on the skin of lab rats. One-inch squares of skin were treated with the chemicals and then scored from 0 to 10 depending on the degree of irritation (10 being highest). Three adjacent one-inch squares were marked on the backs of eight rats, and each of the three chemicals was applied to each rat. The experiment was blocked on rats to eliminate the variation in skin sensitivity from rat to rat. Do the data provide sufficient evidence to indicate a difference in the toxic effect of the three chemicals?

Rat A B C
1 6 5 3
2 9 9 4
3 6 9 3
4 5 8 6
5 7 8 8
6 5 7 5
7 6 7 5
8 6 7 7

Code

df4_long <- data.frame(rat = c(1:8),
                       degree = c(6, 9, 6, 5, 7, 5, 6, 6,
                                  5, 9, 9, 8, 8, 7, 7, 7,
                                  3, 4, 3, 6, 8, 5, 5, 7),
                       tmt = c(rep('A',8), rep('B',8), rep('C',8)))

df4_long

   rat degree tmt
1    1      6   A
2    2      9   A
3    3      6   A
4    4      5   A
5    5      7   A
6    6      5   A
7    7      6   A
8    8      6   A
9    1      5   B
10   2      9   B
11   3      9   B
12   4      8   B
13   5      8   B
14   6      7   B
15   7      7   B
16   8      7   B
17   1      3   C
18   2      4   C
19   3      3   C
20   4      6   C
21   5      8   C
22   6      5   C
23   7      5   C
24   8      7   C

Code

df4_long %>%
  group_by(tmt) %>%
  summarise(
    count = n(),
    mean = mean(degree, na.rm = TRUE),
    sd = sd(degree, na.rm = TRUE),
    median = median(degree),
    min = min(degree),
    max = max(degree)
  )

# A tibble: 3 × 7
  tmt   count  mean    sd median   min   max
  <chr> <int> <dbl> <dbl>  <dbl> <dbl> <dbl>
1 A         8  6.25  1.28    6       5     9
2 B         8  7.5   1.31    7.5     5     9
3 C         8  5.12  1.81    5       3     8

Code

bxp <- ggboxplot(df4_long, x = "tmt", y = "degree", add = "point")
bxp

Method: One-way repeated measures ANOVA

Assumptions of the method:
- Normality
- Homoscedasticity
- Independence

Hypotheses:
H₀: µ_A = µ_B = µ_C
There are no differences in the toxic effect among the treatment groups.

H₁: µ_A ≠ µ_B ≠ µ_C
At least one treatment group is different from another.

Calculation:
F = 5.766

Code

## Calculating One-way repeated measures ANOVA
res_rep_aov <- anova_test(data = df4_long, dv = degree, wid = rat, within = tmt)
get_anova_table(res_rep_aov)

ANOVA Table (type III tests)

  Effect DFn DFd     F     p p<.05   ges
1    tmt   2  14 5.766 0.015     * 0.327

Multiple Comparison:

Code

## Pairwise comparison - bonferroni
  df4_long %>%
  pairwise_t_test(degree ~ tmt, paired = TRUE, p.adjust.method = "bonferroni")

# A tibble: 3 × 10
  .y.    group1 group2    n1    n2 statistic    df     p p.adj p.adj.signif
* <chr>  <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>       
1 degree A      B          8     8     -2.55     7 0.038 0.115 ns          
2 degree A      C          8     8      1.39     7 0.208 0.624 ns          
3 degree B      C          8     8      3.15     7 0.016 0.049 *

Statistical Decision & Interpretation:
Reject H₀

There were a total of 8 lab rats in this study, and three toxic chemicals were treated on each of their skins and degree of skin irritation were recorded. The mean degree of irritation for toxic chemical A was 6.25 (Md = 6, SD = 1.28), for toxic chemical B was 7.5 (Md = 7.5, SD = 1.31), and toxic chemical C was 5.12 (Md = 5, SD = 1.81). A one-way repeated measures ANOVA test showed that the difference was statistically significant between the toxic chemicals, F = 5.766, p < 0.05. Pairwise comparison using Bonferroni showed there was a significant difference between toxic chemical B and A, and B and C (all p < 0.05).

The significant group differences are between B-A and B-C. Toxic chemical B had a mean difference in degree of skin irritation of 1.25 greater than toxic chemical A, and a mean difference of 2.38 greater than group C.

Method: Friedman Test

Assumptions of the method:
- One group that is measured on 3 or more different occasions.
- Group is a random sample from the population.
- Dependent variable should be measured at the ordinal/continuous level

Hypotheses:
H₀: µ_A = µ_B = µ_C
There are no differences in the toxic effect among the treatment groups.

H₁: µ_A ≠ µ_B ≠ µ_C
At least one treatment group is different from another.

Calculation:
t.s. = 6.64

Code

## Friedman Test
df4_long %>% 
  rstatix::friedman_test(degree ~ tmt |rat)

# A tibble: 1 × 6
  .y.        n statistic    df      p method       
* <chr>  <int>     <dbl> <dbl>  <dbl> <chr>        
1 degree     8      6.64     2 0.0361 Friedman test

Multiple Comparison:

Code

## Pairwise comparison - bonferroni
  df4_long %>%
  pairwise_t_test(degree ~ tmt, paired = TRUE, p.adjust.method = "bonferroni")

# A tibble: 3 × 10
  .y.    group1 group2    n1    n2 statistic    df     p p.adj p.adj.signif
* <chr>  <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>       
1 degree A      B          8     8     -2.55     7 0.038 0.115 ns          
2 degree A      C          8     8      1.39     7 0.208 0.624 ns          
3 degree B      C          8     8      3.15     7 0.016 0.049 *

Statistical Decision & Interpretation:
Reject H₀

There were a total of 8 lab rats in this study, and three toxic chemicals were treated on each of their skins and degree of skin irritation were recorded. The mean degree of irritation for toxic chemical A was 6.25 (Md = 6, SD = 1.28), for toxic chemical B was 7.5 (Md = 7.5, SD = 1.31), and toxic chemical C was 5.12 (Md = 5, SD = 1.81). A Friedman test showed that the difference was statistically significant between the toxic chemicals, t.s. = 6.64, p < 0.05. Pairwise comparison using Bonferroni showed there was a significant difference between toxic chemical B and A, and B and C (all p < 0.05).

Method: One-way repeated measures ANOVA

Observed Test Statistics:

Code

## Repeated ANOVA
R <- 9999
reps <- numeric(R)
# calculate the observed F test statistic value
raw.results <- lm(degree ~ as.factor(tmt), data = df4_long)
Fobs <- summary(raw.results)$f[1]
Fobs

   value 
5.113208

p-value:

Code

for (i in 1:R) {
  shuffle <- data.frame(tmt = df4_long$tmt,
                        degree = sample(df4_long$degree,
                                         size = dim(df4_long)[1], replace = FALSE))
  shuffled.results <- lm(degree ~ as.factor(tmt), data = shuffle)
  reps[i] <- summary(shuffled.results)$f[1] # extract the ith permuted F statistic
}
pvalue <- mean(c(Fobs, reps) >= Fobs) # always an upper-tailed test
pvalue

[1] 0.0166

Statistical Decision & Interpretation:
Reject H₀

There were a total of 8 lab rats in this study, and three toxic chemicals were treated on each of their skins and degree of skin irritation were recorded. The mean degree of irritation for toxic chemical A was 6.25 (Md = 6, SD = 1.28), for toxic chemical B was 7.5 (Md = 7.5, SD = 1.31), and toxic chemical C was 5.12 (Md = 5, SD = 1.81). A Test statistics were used for permutation and the exact p-value was similar to the one-way repeated measures ANOVA test (p < 0.05).

Method: Friedman Test

Observed Test Statistics:

Code

## Permutation
coin::friedman_test(degree ~ as.factor(tmt), data = df4_long)


    Asymptotic Friedman Test

data:  degree by
     as.factor(tmt) (A, B, C) 
     stratified by block
chi-squared = 6.6429, df = 2, p-value = 0.0361

Statistical Decision & Interpretation:
Reject H₀

There were a total of 8 lab rats in this study, and three toxic chemicals were treated on each of their skins and degree of skin irritation were recorded. The mean degree of irritation for toxic chemical A was 6.25 (Md = 6, SD = 1.28), for toxic chemical B was 7.5 (Md = 7.5, SD = 1.31), and toxic chemical C was 5.12 (Md = 5, SD = 1.81). A Test statistics were used for permutation and the exact p-value was similar to the Friedman test (p < 0.05).

Problem 5

A behavioral scientist wanted to examine the relationship between body image satisfaction (with higher rating indication higher satisfaction) and actual weight (in pounds) for 12 female subjects enrolled in the study. Do these data provide sufficient evidence to indicate correlation between satisfaction rating and weight? In addition, fit a nonparametric regression model to this data such as monotonic robust rank regression, spline, LOESS or GAM model.

Subject Satisfaction Rating Weight (lb)
1 12 75
2 7 165
3 5 300
4 19 95
5 17 180
6 12 240
7 9 120
8 18 60
9 3 230
10 8 200
11 15 105
12 4 190

Code

df5 <- data.frame(subject = c(1:12),
                  rating = c(12, 7, 5, 19, 17, 12, 9, 18, 3, 8, 15, 4),
                  weight = c(75, 165, 300, 95, 180, 240, 120, 60, 230, 200, 105, 190))
df5

   subject rating weight
1        1     12     75
2        2      7    165
3        3      5    300
4        4     19     95
5        5     17    180
6        6     12    240
7        7      9    120
8        8     18     60
9        9      3    230
10      10      8    200
11      11     15    105
12      12      4    190

Satisfaction Rating

Code

df5 %>%
  summarise(
    count = n(),
    mean = mean(rating, na.rm = TRUE),
    sd = sd(rating, na.rm = TRUE),
    median = median(rating),
    min = min(rating),
    max = max(rating))

  count  mean       sd median min max
1    12 10.75 5.594234   10.5   3  19

Weight

Code

df5 %>% 
    summarise(
    count = n(),
    mean = mean(weight, na.rm = TRUE),
    sd = sd(weight, na.rm = TRUE),
    median = median(weight),
    min = min(weight),
    max = max(weight))

  count     mean       sd median min max
1    12 163.3333 73.71115  172.5  60 300

Code

ggplot(df5, aes(x = weight, y = rating)) +
  geom_point() +
  theme_minimal()

Method: Pearson’s Correlation

Assumptions of the method:
- Variables are continuous
- Variables are paired
- Linear relationship
- No significant outliers

Hypotheses:
H₀: ρ = 0
There is no correlation between weight and satisfaction rating.

H₁: ρ ≠ 0
There is a correlation between weight and satisfaction rating.

Calculation:
ρ = -0.658

Code

## Pearson Correlation
cor.test(df5$rating, df5$weight, method = "pearson")


    Pearson's product-moment correlation

data:  df5$rating and df5$weight
t = -2.7638, df = 10, p-value = 0.02
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8942476 -0.1352617
sample estimates:
       cor 
-0.6580787

Statistical Decision & Interpretation:
Reject H₀

There were a total of 12 female subjects in this study, and a behavioral scientist wants to examine the relationship between body image satisfaction and weight (in pounds). The mean level of satisfaction rating was 10.75 (Md = 10.5, SD = 5.59) and the mean weight (in lb) is 163.3 (Md = 172.5, SD = 73.71) . A Pearson’s Correlation was run to access the relationship between satisfaction rating and weight. There was a strong negative correlation between satisfaction rating and weight r(10) = -0.658 (p < 0.05).

There is a negative correlation between female’s body image satisfaction and weight, where the less a female weights, the more body image satisfaction rating they recorded.

Method: Spearman’s Correlation

Assumptions of the method:
- Variables are continuous
- Variables are paired
- Linear relationship

Hypotheses:
H₀: ρ = 0
There is no correlation between weight and satisfaction rating.

H₁: ρ ≠ 0
There is a correlation between weight and satisfaction rating.

Calculation:
ρ = -0.669

Code

## Spearman Correlation
cor.test(df5$rating, df5$weight, method = "spearman")

Warning in cor.test.default(df5$rating, df5$weight, method = "spearman"): Cannot
compute exact p-value with ties


    Spearman's rank correlation rho

data:  df5$rating and df5$weight
S = 477.33, p-value = 0.01736
alternative hypothesis: true rho is not equal to 0
sample estimates:
       rho 
-0.6690028

Statistical Decision & Interpretation:
Reject H₀

There were a total of 12 female subjects in this study, and a behavioral scientist wants to examine the relationship between body image satisfaction and weight (in pounds). The mean level of satisfaction rating was 10.75 (Md = 10.5, SD = 5.59) and the mean weight (in lb) is 163.3 (Md = 172.5, SD = 73.71). A Spearman’s Correlation was run to access the relationship between satisfaction rating and weight. There was a strong negative correlation between satisfaction rating and weight r(10) = -0.669 (p < 0.05).

There is a negative correlation between female’s body image satisfaction and weight, where the less a female weights, the more body image satisfaction rating they recorded.

Method: Pearson’s Correlation

Observed Test Statistics:

Code

## Permutation
x <- c(12, 7, 5, 19, 17, 12, 9, 18, 3, 8, 15, 4)
y <- c(75, 165, 300, 95, 180, 240, 120, 60, 230, 200, 105, 190)
both <- data.frame(x,y)

R <- 9999
reps <- numeric(R)
# calculate the observed Spearman rank correlation
pearsoncorr <- cor(x, y, method = "pearson")
pearsoncorr

[1] -0.6580787

p-value:

Code

for (i in 1:R) {
  x1 <- both$y
  y1 <- sample(both$x, size = dim(both)[1], replace = FALSE)
  reps[i] <- cor(y1, x1, method = "spearman")
  }
pvalue <- mean(c(pearsoncorr, reps) >= pearsoncorr)
pvalue

[1] 0.9876

Statistical Decision & Interpretation:
Reject H₀

There were a total of 12 female subjects in this study, and a behavioral scientist wants to examine the relationship between body image satisfaction and weight (in pounds). The mean level of satisfaction rating was 10.75 (Md = 10.5, SD = 5.59) and the mean weight (in lb) is 163.3 (Md = 172.5, SD = 73.71). A Test statistics were used for permutation and the exact p-value was similar to the Pearson’s Correlation (p < 0.05).

Method: Spearman Correlation

Observed Test Statistics:

Code

## Permutation
x <- c(12, 7, 5, 19, 17, 12, 9, 18, 3, 8, 15, 4)
y <- c(75, 165, 300, 95, 180, 240, 120, 60, 230, 200, 105, 190)
both <- data.frame(x,y)

R <- 9999
reps <- numeric(R)
# calculate the observed Spearman rank correlation
spearmancorr <- cor(x, y, method = "spearman")
spearmancorr

[1] -0.6690028

p-value:

Code

for (i in 1:R) {
  x1 <- both$y
  y1 <- sample(both$x, size = dim(both)[1], replace = FALSE)
  reps[i] <- cor(y1, x1, method = "spearman")
}
pvalue <- mean(c(spearmancorr, reps) >= spearmancorr)
pvalue

[1] 0.9897

Statistical Decision & Interpretation:
Reject H₀

There were a total of 12 female subjects in this study, and a behavioral scientist wants to examine the relationship between body image satisfaction and weight (in pounds). The mean level of satisfaction rating was 10.75 (Md = 10.5, SD = 5.59) and the mean weight (in lb) is 163.3 (Md = 172.5, SD = 73.71). A Test statistics were used for permutation and the exact p-value was similar to the Spearman’s Correlation (p < 0.05).

Non-Parametric Regression Model

SAS Code:

Code

* creating dataset;
data df5;
input subject $2. rating 3. weight 8.;
datalines;
1 12  75
2 7   165
3 5   300
4 19  95
5 17  180
6 12  240
7 9   120
8 18  60
9 3   230
10 8  200
11 15 105
12 4  190
;
run;

* scatter plot;
ods graphics on;
title Problem 5;
proc sgplot
data = df5;
scatter y = rating x = weight;
run;

* LOESS Model;
title ‘LOESS Model’
ods graphics on;
proc loess data = df5;
model rating = weight / details (ModelSummary OutputStatistics);
run;
ods graphics off;

SAS Results:

Statistical Decision & Interpretation:
Reject H₀

There were a total of 12 female subjects in this study, and a behavioral scientist wants to examine the relationship between body image satisfaction and weight (in pounds). The mean level of satisfaction rating was 10.75 (Md = 10.5, SD = 5.59) and the mean weight (in lb) is 163.3 (Md = 172.5, SD = 73.71). A LOESS model was run to assess the relationship between the satisfaction rating and weight, and based on the aICC and GCV output, the smoothing parameter of 1 was selected.

Proof

Why are all permutations of iid samples from continuous distribution equally likely? Random variable X is iid from a continuous distribution.
Why is P(X_i1 < X_i2 < … < X_in) = P(X_j1 < X_j2 < … < X_jn) = 1/n! for all i,j?

Solution:
Since (X₁,…,X_n) is iid, the sequence (X_𝜋1, …, X_𝜋n) is also iid, for any permutation 𝜋₁,𝜋₂, …,𝜋_n of the numbers {1, …, n}. In particular, it follows that (X₁, …, 𝑋_n) is equal in distribution to (X_𝜋1, …, X_𝜋n). That is, P((X₁, …,𝑋_n) ∈ A) = P((X₁, …, 𝑋_n) ∈ A) for all A. By choosing A to be the set of all increasing tuples, it follows that
P(X₁ < X₂ < ⋯ < X_n) = P(X_𝜋1 < X_𝜋2 < ⋯ < X_𝜋𝑛).
To show that this probability is equal to 1/𝑛!, one must assume that the random variables are non-atomic, that is, for any deterministic number 𝑎, we must have that P(x₁ = a) = 0. (Otherwise, one must take into account the possibility for there to be multiple variables with the same value.) Once we make this assumption, the fact that the probability is 1/n! follows by summing up the probability over all n! permutations: the total probability must be 1, and each probability being summed is the same.