Robust statistical inferences using the bootstrap (PSYCH4091): assessment

#dependencies
library(ggplot2)
library(tibble)
library(tidyverse)
library(plyr)
library(dplyr)
library(WRS2)
library(devtools)
library(bootcorci)
library(readr)

#Exercise 1

ages <- as.matrix(read.csv("./data/ages.csv", header=TRUE)) %>% na.omit(ages)

male_age <- ages[,!colnames(ages) %in% 'f.age']
female_age <- ages[,!colnames(ages) %in% 'm.age']

n.couple <- length(male_age)

Joint.df <- tibble(couple = rep(1:n.couple, 2),
             age = c(male_age, female_age),
             group = rep(c("male_age", "female_age"), each = n.couple))

A violin graph of the comparisons on the ages between males and females collected by Dr. Holzleitner is shown in the “next graph”. A violin plot is an appropriate method to display these two groups of data as violin plots are able to show the spread, asymmetry, outliers, check for skewness and the distribution of data in the one graph (Hintze & Nelson, 1998).

violin graph

ggplot(Joint.df, aes(x = group, y = age)) + 
  geom_violin(colour = "darkgreen", fill = "lightblue") +
  labs(title = "paired observation of ages within couples between male and females", y = "age", x = "sex") +
  stat_summary(fun = median, fun.min = median, fun.max = median, geom = "crossbar", width = 0.5, colour = "darkred")

The violin graph does not indicate large differences in spread in the ages from the collected data by Der. Holzleitner, suggesting that the two disitrbuitons do not differ greatly.

When investigating dependent variables it is important to show how observations are linked between conditions so further in depth queries about the data can be observed (Rousselet et al., 2019). Thus, a stripchart of the differences between male and female age is an appropriate way of delving further into understanding the groups when dependent on each other.

strip chart of differences

set.seed(10) 

df.diff <- tibble(couple = 1:n.couple,
                  difference = female_age - male_age,
                  gr = rep("group", n.couple))

ggplot(df.diff, aes(x=gr, y=difference)) +
  geom_abline(intercept = 0, slope = 0, linetype = 2) + 
  geom_jitter(alpha = 0.7,
              shape = 21,
              size = 2,
              width = .15,
              colour = "blue",
              fill = "green") +
  theme(legend.position="none",
        axis.ticks.x = element_line(colour="red"),
        axis.text.x = element_text(size=14,colour="red"),
        axis.text.y = element_text(size=14),
        axis.title.x = element_text(size=16,face="bold",colour="white"),
        axis.title.y = element_text(size=16,face="bold"),
        plot.title = element_text(colour="black",size=20),
        panel.grid.major.x = element_blank(),
        panel.grid.minor.x = element_blank()) +
  ylab("differences") +
  labs(title = "Differences between male and female ages") +
  scale_y_continuous(limits=c(-10, 10), breaks=seq(-10, 20, 5)) +
   stat_summary(fun = median, fun.min = median, fun.max = median,
                 geom = "crossbar", width = 0.5)

The differences seem mostly negative in our stripchart suggesting that male ages could be on average older than women in our data.

Inferential Statistics

First, the Kolmogorov-Smirnov test will be used as this will give us insight into whether there is a large difference in the ages between males and females from the data collected by Dr. lolzheiner.

#is there a tendency for men as a group to be older than women as a group?
Ks.score <- ks.test(male_age, female_age, alternative = "two.sided")

The test statistic of the Ks test produced a value of 0.13 which indicates that there is a difference in age between males and females, however this difference is not large.

To investigate whether men tend to be older than women within couples we will use a onesample bootstrap test. This method of testing is appropriate as we are interested in finding out further information about our two populations (Gerald, 2018).

onesampb(female_age - male_age, nboot = 2000, alpha=.05)

## Call:
## onesampb(x = female_age - male_age, nboot = 2000, alpha = 0.05)
## 
## Robust location estimate: -1.9858 
## 0.95% confidence interval: -2.4342 -1.4852 
## p-value: 0

summary(female_age - male_age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -16.000  -4.000  -1.000  -2.238   0.000  11.000

The median of the differences is -1 with a CI of [-2.41 -1.46], so it seems on average if we randomly select a man and randomly select a woman from the two groups the man would have a higher tendency of being older than a woman within couples.

Conclusions The kS test suggests evidence that the ages of men and women tend to differ as groups. This consideration can be further elaborated on when viewing the violin graph which displays males to be older than women. As such it could be viewed that males as a group based on this known difference between the groups could tend to be older than women, however more extensive and reliable testing would need to be carried out to come to a more definitive conclusion.

The percentile bootstrap carried out revealed that men within couples had a higher tendency of being older than women.

#Exercise 2

TD <- read.table("data/TD_OPAF.txt") 
ASD <- read.table("data/ASD_OPAF.txt") 

TD <- as.matrix(TD)
ASD <- as.matrix(ASD)

n.TD <- length(TD)
n.ASD <- length(ASD)

Replicating graph

df <- tibble(Occipital.peak.alpha.frequency = c(TD, ASD),
             group = c(rep("TD", n.TD), rep("ASD", n.ASD)))

ggplot(df, aes(x = group,
               y = Occipital.peak.alpha.frequency,
               fill = group)) +
geom_jitter(alpha = 0.6,
            shape = 20,
            size = 4,
            width = .15) +
  theme_bw() +
  theme(legend.position = "none") +
  theme(axis.title.y = element_text(size=16,face="bold"),
        axis.title.x = element_blank(),
        axis.text = element_text(size=14)) +
        ylab("Occipital peak alpha frequency") +
coord_cartesian(ylim = c(6, 12)) +
   stat_summary(fun = median, fun.min = median, fun.max = median,
                 geom = "crossbar", width = 0.4, colour = "purple",
                 fun.args = list(trim = 0.2))

Our graph appears similar to the one produced by Dickinson et al (2018), with the TD children cluster appearing to have a higher OPAF than the ASD cluster.

The wilcoxon-mann-whitney test was chosen, this method of testing offers insight into the probability that a random observation from a group could be larger than a random observation from a different group (Wilcox & Rousselet, 2018). Thus the test is applicable when investigating whether ASD children have lower OPAF than TD children. We can enquire to see if there is a probability if on average ASD children report lower OPAF than TD children.

##wilcox test
m2 <- wilcox.test(Occipital.peak.alpha.frequency ~ group, df, exact=FALSE, alternative = "less")

The p-value, p = 0.13, of the test produced a value that is above the null, suggesting against support for the notion that ASD children have lower OPAF than TD children.

All pairwise differences Collating all pairwise differences will offer knowledge of the differences in means between TD and ASD.

TASD <- as.vector(outer(TD, ASD, FUN="-"))
x <- 1:6
y <- 1:6
outer(x,y,FUN="-")

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0   -1   -2   -3   -4   -5
## [2,]    1    0   -1   -2   -3   -4
## [3,]    2    1    0   -1   -2   -3
## [4,]    3    2    1    0   -1   -2
## [5,]    4    3    2    1    0   -1
## [6,]    5    4    3    2    1    0

Histogram

The histogram is a valuable graph for assessing central tendencies between variables as it makes it easier to see the dispersion, shape and spread of data (Tauxe et al., 1991).

ggplot(tibble::enframe(TASD), aes(value)) +
  geom_histogram(alpha = 1 , fill = "darkred", colour = "darkgreen") +
  geom_vline(xintercept = median(TASD)) +
  theme(legend.position="none",
        axis.text = element_text(size=14),
        axis.title = element_text(size=16,face="bold")) +
  ylab("Histogram") +
  xlab("All pairwise differences")

As we are investigating the central tendency, it is appropriate to query variables through their mean. We are using the bootstrap-t test as this method is able to avoid some of the limitations from standard methods. Further, we will use a 20% trimmed mean as this will improve our control over the probability of a type 1 error (Wilcox & Keselman, 2003). A population trimmed mean is the population mean of a transformed distribution in which the tails of the distribution are eliminated (Rousselet et al., 2019).

#boostrap-t on trimmed means
bootstrap_t.ASD <- trimcibt(ASD, nullval = 0, nboot = 1000, tr = 0.2, alpha = .05)
bootstrap_t.TD <- trimcibt(TD, nullval = 0, nboot = 1000, tr = 0.2, alpha = .05)

The bootstrap t-test results: ASD 9.01(p = 0, [CI = 8.81 9.22]) TD 9.17(p = 0, [CI = 8.84 9.51])

Conclusions The wilcoxon-mann-whitney test that we conducted did not provide conclusive support for the view that ASD children reported lower OPAF than TD children. However, the p-value was only 0.08 away from the null value, suggesting that ASD was lower to some degree than TD in comparison to OPAF scores.

The results of the bootstrap-t test for ASD and TD demonstrated that the central tendencies of ASD and TD varied but that this variation was limited. This is evident through a comparison of the boot means, there is only a 0.16 difference, and as the mean value is one of the key components of equating central tendency, it could be viewed that the two variables do not vary greatly in terms of central tendency.

#Exercise 3

word <- exp3.data[,!colnames(exp3.data) %in% 'non-word'] 
non_word <- exp3.data[,!colnames(exp3.data) %in% 'word']
n.word <- length(word)
n.non_word <- length(non_word)

e3.df <- as.tibble(exp3.data)
e3.df.diff <- exp3.data %>% tibble(non_word - word) 
e3.df.diff <- as.numeric(e3.df.diff$`non_word - word`)

A percentile bootstrap will be used to generate a highest density interval for the 20% mean difference in reaction time between res1 and res2.

This method is applicable in this instance due to the methods ability to investigate differences in the spread between two groups. Confidence intervals are set to 95% when using this method of inferential statistics. A confidence interval can be considered as the matching interval with the data (Rousselet et al., 2019). The HDI is used as it summarizes the range of the most credible values of a measurement (Krushke, 2018).

set.seed(10)
np <- 50
nboot <- 2000

reaction_time <- vector(mode = "numeric", length = nboot) 
reaction_time2 <- apply(matrix(sample(e3.df.diff, size = np*nboot, replace = TRUE), nrow = nboot), 1, mean, trim = 0.2)

alpha <- 0.05
HDI <- HDInterval::hdi(e3.df.diff, credMass = 1-alpha)

reactiontime.df <- as_tibble(with(density(reaction_time2),data.frame(x,y)))

ggplot(reactiontime.df, aes(x = x, y=y)) +
coord_cartesian(xlim = c(-100, 280)) +
labs(caption = "Bootstrap distribution of 20% trimmed mean differences between reaction times of data from labs 1 and 2",title = "Bootstrap distribution of reaction time differences",
x = "Differences in reaction times",
y = "Density") +
  geom_line(size = 2, colour = "red") +
geom_segment(x = HDI[1], xend = HDI[2],
y = 0, yend = 0,
lineend = "round", size = 3, colour = "purple") +
annotate(geom = "label", x = HDI[1]+0.15, y = 0.005, size = 6,
colour = "white", fill = "purple", fontface = "bold",
label = paste("L = ", round(HDI[1], digits = 3))) +
annotate(geom = "label", x = HDI[2]-0.15, y = 0.005, size = 6,
colour = "white", fill = "purple", fontface = "bold",
label = paste("U = ", round(HDI[2], digits = 3))) +
  theme(plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),
axis.title = element_text(size = 0))

The group difference found between reaction times for our replication of the study was 132.54 with a CI of -69.1 193.1.

Conclusions The group difference portrayed in this replication differs from both groups reported reaction time difference. The replication of the study however slightly favours the second group as the group difference found was considerably larger than the first labs. Thus, this support is further reinforced by our replication including more participants increasing the reliability of this notion.

#Exercise 4

# column 1 = behaviour; column 2 = brain activation
np <- 30 # number of participants
ipl.in <- matrix(unlist(read_csv("./data/inhibition-IPL.csv", col_names = FALSE)), nrow = np)
mfg.in <- matrix(unlist(read_csv("./data/inhibition-MFG.csv", col_names = FALSE)), nrow = np)
ipl.sh <- matrix(unlist(read_csv("./data/shifting-IPL.csv", col_names = FALSE)), nrow = np)
mfg.sh <- matrix(unlist(read_csv("./data/shifting-MFG.csv", col_names = FALSE)), nrow = np)

Considering the conclusions from sun et al (2019) it is apparent that there are issues within their correlation analysis. To start, their choice of using pearson correlation reduces credibility of their claims as this method of correlation testing is not robust and is sensitive to many factors including heteroscedasticity, thus a significant test cannot be used to determine the correlation between variables (Wilcox, 2017). As such we will use a method of testing that is more robust to heteroscedasticity and outliers through using the rank order of data in the form of the Spearman correlation test.Further, percentile bootstrap correlations will be used as this method also offers another robust testing method of correlations in comparison to the method used by the researchers.

ipl.in.df <- as.data.frame(ipl.in)
ipl.sh.df <- as.data.frame(ipl.sh)
mfg.in.df <- as.data.frame(mfg.in)
mfg.sh.df <- as.data.frame(mfg.sh)

Bootstrap analyses

First we compute percentile bootstrap CIs for Pearson correlations. AS the correlations are the main outcome of the experiment, and they are fast to compute, it’s a good idea to using a large number of bootstrap samples is beneficial to reducing the uncertainty associated with that source of variability (Hesterberg 2015).

set.seed(10)
nboot <- 10000
boot.ipl.in <- corci(ipl.in.df$V1, ipl.in.df$V2, method = "pearson", nboot = nboot)
boot.ipl.sh <- corci(ipl.sh.df$V1, ipl.sh.df$V2, method = "pearson", nboot = nboot)
boot.mfg.in <- corci(mfg.in.df$V1, mfg.in.df$V2, method = "pearson", nboot = nboot)
boot.mfg.sh <- corci(mfg.sh.df$V1, mfg.sh.df$V2, method = "pearson", nboot = nboot)

These are the bootstrap results: boot.ipl.in = -0.31 [-0.64 0.08], p = 0.06 boot.ipl.sh = -0.65 [-0.87 -0.26] p = 0.001 boot.mfg.in = -0.58 [-0.8 -0.28] p = 0.2e-04 boot.mfg.sh = -0.35 [-0.65 0.007] p = 0.02

The CI of the shifting and inhibition for ipl and mfg suggests that the population values are compatible with the data, as such can be viewed as good as the sample size in terms of accuracy.

Spearman_ipl.in <- corci(
  ipl.in.df$V1,
  ipl.in.df$V2,
  method = "spearman",
  nboot = 2000,
  alpha = 0.05,
  alternative = "two.sided",
  null.value = 0,
  saveboot = TRUE
)

Spearman_ipl.sh <- corci(
  ipl.sh.df$V1,
  ipl.sh.df$V2,
  method = "spearman",
  nboot = 2000,
  alpha = 0.05,
  alternative = "two.sided",
  null.value = 0,
  saveboot = TRUE
)

Spearman_mfg.in <- corci(
  mfg.in.df$V1,
  mfg.in.df$V2,
  method = "spearman",
  nboot = 2000,
  alpha = 0.05,
  alternative = "two.sided",
  null.value = 0,
  saveboot = TRUE
)

Spearman_mfg.sh <- corci(
  mfg.sh.df$V1,
  mfg.sh.df$V2,
  method = "spearman",
  nboot = 2000,
  alpha = 0.05,
  alternative = "two.sided",
  null.value = 0,
  saveboot = TRUE
)

The spearman results: spearman.ipl.in = -0.33 [-0.65 0.06], p = 0.09 spearman.ipl.sh = -0.62 [-0.85 -0.26], p = 0.005 spearman.mfg.in = -0.62 [-0.83 -0.33], p = 0 spearman.mfg.sh = -0.33 [-0.61 -0.008], p = 0.054

The CI are relatively narrow for correlation suggesting there could be some degree of reliability to these values. Further, the majority of CI intervals are negative which further supports the conclusions that there is negative correlations between the variables.

Conclusions To answer the conclusions offered by Sun et al (2019) the results from the spearman and the bootstrap correlations will be analysed against their claims. Considering the bootstrap results first, these results contradict the claims of the conclusion through displaying that mfg had a stronger relationship to shifting rather than inhibition. Secondly, the Spearman results further add to contradict the conclusions of Sun et al (2019), this is evident through mfg once again displaying a stronger relationship to shifting than inhibition.

As both the bootstrap and Spearman offer more robust and appropriate methods of measuring correlations between variables and have offered contradicitons to the conclusions of Sun et al (2019) it can be suggested that they should not be supported.

References

Altunkaynak, B., & Gamgam, H. (2019). Bootstrap confidence intervals for the coefficient of quartile variation. Communications in Statistics-Simulation and Computation, 48(7), 2138-2146.

Gerald, B. (2018). A brief review of independent, dependent and one sample t-test. International Journal of Applied Mathematics and Theoretical Physics, 4(2), 50-54.

Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. European journal of epidemiology, 31 (4), 337-350.

Hintze, J. L., & Nelson, R. D. (1998). Violin plots: a box plot-density trace synergism. The American Statistician, 52(2), 181-184.

Kruschke, J. K. (2018). Rejecting or Accepting Parameter Values in Bayesian Estimation. Advances in Methods and Practices in Psychological Science, 1 (2), 270–280. https://doi.org/10.1177/2515245918771304

Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2019). An introduction to the bootstrap: A versatile method to make inferences by using data-driven simulations. https://doi.org/10.31234/osf.io/h8ft7

Rousselet, G. A., Pernet, C. R., & Wilcox, R. R. (2021). The percentile bootstrap: A primer with step-by-step instructions in R. Advances in Methods and Practices in Psychological Science, 4 (1), 2515245920911881.

Sun, X., Li, L., Mo, C., Mo, L., Wang, R., & Ding, G. (2019). Dissociating the neural substrates for inhibition and shifting in domain‐general cognitive control. European Journal of Neuroscience, 50(2), 1920-1931.

Tauxe, L., Kylstra, N., & Constable, C. (1991). Bootstrap statistics for paleomagnetic data. Journal of Geophysical Research: Solid Earth, 96(B7), 11723-11740.

Weissgerber, T. L., Milic, N. M., Winham, S. J., & Garovic, V. D. (2015). Beyond bar and line graphs: time for a new data presentation paradigm. PLoS biology, 13 (4), e1002128.

Wilcox, R. R., & Keselman, H. J. (2003). Modern robust data analysis methods: Measures of central tendency. Psychological Methods, 8 (3), 254–274. https://doi.org/10.1037/1082-989x.8.3.254

Wilcox, R. & Rousselet, G. (2018) A guide to robust statistical methods in neuroscience Current Protocols in Neuroscience 82: 8.42.1–8.42.30. doi: 10.1002/cpns.41

Robust statistical inferences using the bootstrap (PSYCH4091): assessment

Student ID = 2481535

2022-11-03