# Crucial settings
rm(list = ls()) # clean-up the environment

# Load libraries
library(knitr)
library(tidyverse)
library(readxl)

Statistical power

Statistical power is the probability that the effect you detect running a test (e.g. sinificant correlation coeficient) truly exists. The higher the power, the more likely the test can detect a true effect.

Various factors affect statistical power: sample size, the effect size, and the inherent variability in the data.

As a matter of fact you should estimate the power of the test a priori, already when you plan your study (to collect sufficient sample size - this is actually the only factor associated with power analysis that you can control), see more on that here. But if that is not possible, you can always evaluate posteriori your chance of detecting an effect if the one exists. For those who would like to dive deeper into the concept of statistical power, I recommend this post

And for those who would like to perform analysis of statistical power on their own this link could be a good start.

To briefly ilustrate the issue, let’s consider the plot below.

The plot presents relationship between estimated effect (here, correlation coeficient), and sample size in regard to the statistical power (color lines). There are two important messages from the plot, both pretty intuitive (and already stated above):

  1. the power of the test depends on sample size and value of the effect,

  2. for lower effects bigger sample sizes yield higher power.

Thus, if we agreed upon, let’s say 70% of power test (light-blue line), for a coeficient of 0.3, we would need ca 100 data points (a priori conclusion) or test power for a coeficient of 0.3 established based on 50 data points something around 50% (posteriori conlcusion).

Sometimes, we are not able to collect bigger sample size. Then, performing classical statistical tests on such a small sample size will definately yield to results of low power. Therefore, whenever this is the case, an alternative approach for testing significance of the observed effect is recommended - randomization (permutation) or bootstrapping.

Study case

Let’s consider a little example with the little auk

The little auk (Alle alle), as you can see on the picture above, is a small black-and-white, monomorphic (males and females looks similar) seabird. Perhaps, you can also make an educated guess about the breeding system of the species, which is a monogamy (a male pairs with a single female). Importantly (hard to figure out from the picture) is that partners mate for life, meaning that chosing a right breeding partner is of great importance in the little auk.

Choosing the right partner is not strightforward but it has been established that partners tend to pair in regard to several traits in an assortative manner. What are these trais and assortativity is not the part of the present story but if you got intertested you can read more about it here. For the present story, the most important is to say that a possitive relationship has been found between the partners in size area of their eye-lids’ white patches, suggesting an assortative mating. Statistically speaking the larger white eye-lid patch has a male the larger it will be on the female paired with this male. The plot below nicely illustrates this relationship, and the simplest index describinng this relationship is the correlation coefficient.

# Load library and data

library(ggplot2)
white.eye <- read.csv("C:/Users/KWJ/Dropbox/Working files/Dydaktyka/Biostatistics/Classes10_randomizacja/white.eye.csv", sep = ";") # note we have data stored in csv file that requires read.csv() to read data

# Explore the relationship
ggplot(data = white.eye, mapping = aes(x = Female,  y = Male)) + 
  geom_point() + 
  geom_smooth(method = "lm") +
  theme_classic()

# Correlation coeficient
asstest <- round(cor(x = white.eye$Female, y = white.eye$Male),2); cat(asstest)
0.32

We can see there is a positive trend (as denoted by the regression line), and the correlattion coeficient is 0.32

Now the question is whether this correlation coeficient is significant? Well, if we perform the relevant test, it is significant:

cor.test(x = white.eye$Female, y = white.eye$Male)

    Pearson's product-moment correlation

data:  white.eye$Female and white.eye$Male
t = 2.4127, df = 50, p-value = 0.01954
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.05486219 0.54757207
sample estimates:
      cor 
0.3229277 

There is 52 data points and we know already for such a sample size and correlation coeficient of 0.32 statistical power of the analysis is pretty low (~ 50%). Let’s then try to apply randomization and bootstraping.

Randomization (permutation)

A randomization (also permutation) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under possible rearrangements of the observed data points. It may sound enigmatic but will clarify with description of the procedure, which narrated with the little auk example is as following:

  1. First, we calculate an effect from the data set. For the little auk example it will be the correlation coeficient for the white eye-lid patch of the breeding partners, which is 0.32. Let’s called it hearafter observed coeficient.

  2. Then, we shuflle our data and calculate the effect for such a randomized data set. In the considered example we shuffle males to pair them randomly with females, and for these random pairs we calculate the correlation coeficient. Let’s called it hearafter randomized coeficient. So, we get a correlation coefficient from pairs that have been created simply by chance (i.e. pairs were not form basesd on birds preference in the size of the white eye-lid patch but His Majesty R “forced” pairing in a random manner).

  3. Since, we want to well explore what is going on with the effect (in the example the corellation coeficient) in a such a “randomiazed reality” we need to repeat the second step many times (1000 - 10 000 is the exploited range).

  4. Then, we plot all the randomized effects, so coeficients, and so we receive distribution of these effects (correlation coeficients) that could be expected from the analyzed data set under all possible rearrangements of the observed data points.

  5. Owing to multiple iterations (and central limit theorem) we receive a normal distribution of the randomized effects (coeficients). Thus, we can relate our observed coeficient to that distribution and conclude how it is different from that what could be expected by chance. Actually, we can calculate a proportion of the effects (coeficients) being equal or more extreme that the observed one, which can be considered as P value!

How to do it in R

To perform the randomization we will apply for loop - super efficient tool in R that is design simply to do any operation a fixed number of times. I will guide you through this loop here (see the comments on the code) but if you would like to familiarize yourself with loops in R (and I would encourage you to do so) there is a plenty of materials on the Internet (an example), simply type for loop in R and enjoy it!

set.seed(1313) # we need  to set seed for reproducible results (if you will not set it you will get slightly different values)

# Initial settings (set-up before running the loop)

N <- 1000 # number of iterations (the number of times we want to repeat the sampling, i.e. point 2 in the described above procedure)
out <- numeric(N) # an empty vector where results from the loop will be stored

female <- white.eye$Female # for convenience assign one of the variables in the correlation (females) in a "female" vector (i.e. when in the loop, it is easier/clearer to write just "females" instead of "white.eye$Female").

# The loop

# The loop consists of two parts:

# The first, (in regular backets), tells R how the iterations are supposed to be performed. It would read: for each i element/step in a sequence from 1 to N, which is 1000 as we specified above, do what is inside the {curle brackets}.

# The the second part, in the {curle brackets} is a core of the loop and it does the essential. So here, we want to first shuffle data and then calculate the correlation coeficient. Of course we also want to save this coeficient for further analyses. 

# To shuffle data, we do a simple trick here, we keep females variable as it is, and then rearrange males variable using the sample() function.  

# The sample() requires three arguments:   

# a) vector from which data will be sampled (here males variable)  

# b) number of times the sampling is supposed to be performed (here number of females to be paird with)  

# c) replace - TRUE/FALSE argument. If FALSE, as it is here, means that data point once sampled in given step becomes unavailable for the next sampling in the same step (i.e. sampling for example 20 samples from a vector of 20-elements you will sample all of these elements, they will be just ordered in a different way after the sampling)   


for (i in 1:N) { 
   male <- sample(white.eye$Male, nrow(white.eye), replace = FALSE) # rearrangment of males variable
   out[i] <- cor(x = female, y = male) # correlation coeficient assigned to out object on i position
}

# In other words, the present loop in the very first step (i = 1) rearranges males variable with a sample() function, calculates correlation coeficient for the two variables: the females variable (as it was) and the rearanged males variable (as they were pairs), and assign the result to "out" vector (created in advance), as a first element of that vector (out[1]). Then it goes to the second step (now i = 2), and it de novo rearranges males variable with a sample() function, calculates correlation coeficient for the females variable (as it was) and the rearanged males variable, and assign the result to the same "out" vector but now on the second position (out[2]). And so on, and so on, up to 1000 times.

# As said in the main text the proportion of cases when randomized correlation coeficient is equal or more extreme (here higher) than the observed one can be condidered a P value. 

# To calculate this proportion we use a simple trick where we apply a boolean question (TRUE/FALSE) when comparing randomized coefficients with and observed one. Then we simply sum up all the TRUE's which are equal to 0 in R. Since we performed N = 1000 iterations, to get the proportion, we need to divide the sum by 1000. For clarity, we also round this p-value to 3 decimal points, with round() function. Finally, we assing the proportion to a "p.val" object, to used it later in the code for the plot.

p.val <- round(sum(out>=asstest)/N,3)

# Now we can vizualise the randomized correlation coeficients.

# Plot for the outcome of the loop

ggplot() + 
  geom_density(aes(out), fill = "grey") +
  theme_classic() +

  # we can also add the observed correlation coeficient in the form of vertical line
  # to do this we use geom_vline where the xintercept is the value of the observed correlation coeficient
  geom_vline(aes(xintercept = asstest), linetype = "dashed", color = "red") +

  # and some plot tunnig...   
  scale_x_continuous(lim = c(-0.5, 0.5)) +
  scale_y_continuous(expand = c(0,0)) +
  
  # to add p value we can paste it into a title:
  labs(title = paste0("Permutation - distribution of the correlation coeficient\n for the partners white patch,\n p = ", p.val, sep = ""), 
       x = "Coeficient", 
       y = "Density")

INTERPRETATION: We can see that a great majority of the corelation coeficients that we received from the shuffled data sets (grey density geom) is lower that the observed one (red verical line). Actually, only 1.1% of them is equal or more extreme/higher. If so we can reject the null hypothesis, as we have only 1.1% for making an error type I. And what is the null hypothesis in this case?! Simple, the observed effect (here coeficient) is random, can be obtained by a chance. Well, this randomization tells us that by chance only in 1.1% of cases such a coeficient is possible simply by chance. That’s very little, so we reject H0 and conclude that the observed effect (coeficient) is not random. It is potentially meaningful in biological sense, and for sure is not driven by chance.

Bootstrap

The bootstrap is procedure is very similar to permutation. The difference is that for this analysis we sample WITH REPLACEMENT full records from data set, so we do not change internal relationships between the variables as it was the case for the permutation. We simply construct a data set from the original one, exept that in the sampled set some records maybe represented multiple times, while others may not be reprented at all. As you can easily deduced from that, this procedure is good to test how much influential are particular data points in your set (i.e. is your statistic significant only because of the single data point?). So this is the key difference in regard to permutation. Anything else is pretty much the same. As for the permutation, we perfom the sampling multiple times, after each calcualting a desired statistic/estimate. At the end of the process, we have a distribution of the statistic/estimate. Based on that distribution we can construct meaningful confidence intervals for the examined estimate/statistic. End of story!

How to do it in R

Technically, the procedure is very similar to the permutation described above (see the main text and in the comments within the code section). It is based on the for loop with all the inherent attributes of the tool. The only difference is related to the fact that we do not really shuffle data but build a new data set based on randomly sampled records. To do so, we apply sample_n() function to the data frame. Since this is bootstraping, we also sample the records WITH REPLACEMENT, and for that we set-up the argument ‘replace’ to TRUE. Anything else is like for the permutation.

set.seed(1313) # again, we need to set seed for reproducible results (if you will not set it you will get slightly different values)

# Initial settings (set-up before running the loop)

N <- 1000 # number of iterations (the number of times we want to repeat the sampling, i.e. point 2 in the described above procedure)

out_boot <- numeric(N) # an empty vector where results from the loop will be stored

# The loop (for more details on the loop see the section code for the permutation)

for (i in 1:N) { 
  
   df_boot <- dplyr::sample_n(white.eye, nrow(white.eye), replace = TRUE) # Note that we sample_n() function from dplyr packagde. This function samples just rows from data frame, so male and female are paired as they are (as expected in the bootstraping). Besides, we sample with replacement, so we end up after such a sampling with the same data; just some records may be represted multiple times, while others may not be represented at all 
   out_boot[i] <- cor(x = df_boot$Female, y = df_boot$Male) # correlation coeficient assigned to 'out' object on i position
}

# Calculating the p value for bootstraping is a bit different, as we do not refeer to the observed coeficient but to 0 - see the interpretation section in the main text below.

p.val_boot <- round(sum(out_boot<=0)/N,3)

# Now we can vizualise the randomized correlation coeficients.

# Plot for the outcome of the loop

ggplot() + 
  geom_density(aes(out_boot), fill = "grey") +
  theme_classic() +

  # we can also add the observed correlation coeficient in the form of vertical line
  # to do this we use geom_vline where the xintercept is the value of the observed correlation coeficient
  geom_vline(aes(xintercept = asstest), linetype = "dashed", color = "red") +

  # we can also denote 0, which is important for interference - see explanation in the interpretation section in the main text below
  geom_vline(aes(xintercept = 0), linetype = "dashed", color = "blue") +

  # and some plot tunnig...   
  scale_x_continuous(lim = c(-0.35, 0.75)) +
  scale_y_continuous(expand = c(0,0)) +
  
  # to add p value we can paste it into a title:
  labs(title = paste0("Bootstrap - distribution of the correlation coeficient\n for the partners white patch,\n p = ", p.val_boot, sep = ""), 
       x = "Coeficient", 
       y = "Density")

INTERPRETATION: We can see that the observed coeficent observed one (red verical line) lies well within the distribution of the randomized coeficient. Actually, in the middle! And it is good, becasue this is just what we could expect if our data is well represented (not biased; properly - sampled in the population). The null hypothesis here in the bootstraping procedure is that distribution of the coeficient is overlapping with 0 (i.e. equals to 0 or negative). The calculated p value = 0.031, so there are only 3.1% of randomized coeficients that cross this 0 threshold. Thus, we can reject the null hypothesis, as we have only 3.1% chance for receiving insignificant (equals 0) or negative coeficient. Note we are not very much intersted here in the value of the coeficient, we just want to make sure that regardless of data set it will stay significant (different from 0) and positive.