Reliability

Overview

This worksheet will give you an overview of how to use r to graph and run statistics on data where you are interested determining the reliability and validity.

Reliability of a measure (test) is some measure of its association with itself in repeated trials. “How reproducible is the practical measure?”

Validity of a (practical) measure (test) is some measure of its one-off association with another measure. “How well does the measure measure what it’s supposed to?”

Before your start ensure you have downloaded the CMJ_example01.xlsx and save it into a specific folder.

Also make sure you set your working directory as this folder:

Then load packages: “tidyverse”, “dplyr”, “ggplot2”, “esquisse”,“performance”,“rmcorr” , “DT”, “ggpubr”, “psych”, “readxl”

Now you are ready to load in your data:

setwd("~/Library/CloudStorage/OneDrive-TeessideUniversity/Work/Teaching/Assessment of human performer/2023_24")

data <- read_excel("CMJ_example01.xlsx", 
                   sheet = "CMJ")
head(data)

# A tibble: 6 × 3
     ID CMJ_Height_1 CMJ_Height_2
  <dbl>        <dbl>        <dbl>
1     1         17.1         20.0
2     2         26.4         29.7
3     3         23.7         26.9
4     4         19.5         20.4
5     5         33.4         34.6
6     6         28.6         30.8

Relative reliability

You can start by graphing your data, the following code loads a graphing packages which allows you to make some individualised plots and generates code you can copy and run:

esquisser(data)Alternatively you can run / amend the following code. Note, you will need to add the correct correlation values for your data.

ggscatter(data, x = "CMJ_Height_2", y= "CMJ_Height_1",add = "reg.line",conf.int = TRUE) + 
  geom_point(colour = "blue") + 
  scale_y_continuous( labels = scales::number_format(accuracy = 0.1)) + xlab("Week 2 Counter Movement Jump Height (cm)") + ylab("Week 1 Counter Movement Jump Height (cm)")+
  ggtitle("Pearson's r = or ICC with 95CI confidence intervals?")

Reliability statistics

So these data look to have a strong correlation? But lets run some statistics - we can use a Pearson’s r or perhaps our better option is an Intra-class correlation coefficient (Note this wants to be an ICC3 for this type of data (Single fixed raters).

Pearson’s correlation

So here is the code for a Pearson’s correlation:

pearsons <- cor.test(data$CMJ_Height_2, data$CMJ_Height_1, 
                method = "pearson")
pearsons


    Pearson's product-moment correlation

data:  data$CMJ_Height_2 and data$CMJ_Height_1
t = 26.297, df = 47, p-value < 0.00000000000000022
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.9430712 0.9817243
sample estimates:
      cor 
0.9676565

We would report this as Pearson’s r = 0.968 (95% CI 0.943 to 0.982) you may report the p-value but it actually has very little relevance here (stating that the true correlation is not equal to 0 doesn’t tell us much, and if the 95% CI do not overlap zero then p will be <0.05.

Note, Hopkins et al., 2009 suggest we describe correlations as <0.1 = trivial, 0. 1 to <0.3 = small, 0.3 to <0.5 = moderate, 0.5 to <0.7 = high, 0.7 to <0.9 = very high and >0.9 is almost perfect. So we can say this is an “almost perfect” correlation, and we can be confident of this as the lower 95% confidence interval is also >0.9.

Inter-class correlation coefficient (ICC)

Now for the ICC (we just need to make sure we only have two columns or it will look at the association between the two jumps and the participant id which would not be great!)

Call: ICC(x = ICCd)

Intraclass correlation coefficients 
                         type  ICC  F df1 df2                                 p
Single_raters_absolute   ICC1 0.89 17  48  49 0.0000000000000000029272372579914
Single_random_raters     ICC2 0.89 60  48  48 0.0000000000000000000000000000016
Single_fixed_raters      ICC3 0.97 60  48  48 0.0000000000000000000000000000016
Average_raters_absolute ICC1k 0.94 17  48  49 0.0000000000000000029272372579914
Average_random_raters   ICC2k 0.94 60  48  48 0.0000000000000000000000000000016
Average_fixed_raters    ICC3k 0.98 60  48  48 0.0000000000000000000000000000016
                        lower bound upper bound
Single_raters_absolute        0.808        0.93
Single_random_raters          0.050        0.97
Single_fixed_raters           0.943        0.98
Average_raters_absolute       0.894        0.97
Average_random_raters         0.096        0.99
Average_fixed_raters          0.970        0.99

 Number of subjects = 49     Number of Judges =  2
See the help file for a discussion of the other 4 McGraw and Wong estimates,

We would report this as ICC(3,1) = 0.967 (95% CI 0.943 to 0.981) as above you may report the p-value but why? In this case both correlation types are very similar:

Note: Descriptors for intraclass correlation coefficient (ICC) are as follows: >0.90 (high); 0.80-0.90 (moderate); *<0.80 (low)

So you can say we have a high correlation. Great but remember relative reliability is not ideal on its own. So we need to 1. Check for systematic bias and 2 calculate the within-participant variation

Systematic bias

To do this we could do with having our data in “long format” that is stacked and not spread. Easy to do in r as you can just run this line of code:

long_data<-pivot_longer(data, cols = 2:3, names_to = "Jump", values_to = "Height")
head(long_data)

# A tibble: 6 × 3
     ID Jump         Height
  <dbl> <chr>         <dbl>
1     1 CMJ_Height_1   17.1
2     1 CMJ_Height_2   20.0
3     2 CMJ_Height_1   26.4
4     2 CMJ_Height_2   29.7
5     3 CMJ_Height_1   23.7
6     3 CMJ_Height_2   26.9

Now we can plot it, again you can make your own using the esquisser package:

esquisser(long_data)

Or you can run mine here (be warned - it needs work!):

ggplot(long_data) +
  aes(x = Jump, y = Height, fill = Jump) +
  geom_boxplot() +
  geom_jitter(size=2) +
  scale_fill_brewer(palette = "Set2", direction = 1) +
  scale_color_brewer(palette = "Set2", direction = 1) +
  theme_classic() +
  theme(legend.position = "none")

But is there a statistical difference between the groups? For that we might want to run a t-test:

t.test(Height ~ Jump, data = long_data, paired = TRUE)


    Paired t-test

data:  Height by Jump
t = -11.349, df = 48, p-value = 0.000000000000003429
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -2.463432 -1.721925
sample estimates:
mean difference 
      -2.092678

Here we would report a 2.09 (95% CI 1.72 to 2.46) cm increase in the second test.

Again you could report the p-value here p<0.0001 in this case or the t-statistic. NOTE: if you do not have a significant difference this does not mean the data are statistically equivalent either - that requires another statistic (Equivalence Testing) but that is getting beyond undergraduate level unless you are doing a dissertation!

Within participant variation

Typical or standard error of the measurement is the standard deviation for the difference between test 1 & test 2 divided by square root of 2 Coefficient of variation (TE as %age) and should be derived from log transformed data.

We first need to create a column that calculated the difference between test 1 and 2:

TE <- data %>%
  mutate(Change = CMJ_Height_2 - CMJ_Height_1)

head(TE)

# A tibble: 6 × 4
     ID CMJ_Height_1 CMJ_Height_2 Change
  <dbl>        <dbl>        <dbl>  <dbl>
1     1         17.1         20.0  2.87 
2     2         26.4         29.7  3.32 
3     3         23.7         26.9  3.21 
4     4         19.5         20.4  0.881
5     5         33.4         34.6  1.16 
6     6         28.6         30.8  2.22

Now we need to create a function to calculate typical error and then use it for the new column:

typical_error <- function(x) sd(x) / sqrt(2) # Create own function for typical error
typical_error(TE$Change)

[1] 0.9127147

Now we have a typical or standard error of the estimate of 0.923 cm - this is the noise in your data. so if you have an athlete who has improved their CMJ by 0.9 cm then you don’t know if this down to a real improvement or if it was just down to noise or error!

Note typical error can be easily calculated in excel and their are some excellent spreadsheets at sportsci.org that will provide confidence intervals for it and coefficients of variation (of course we could do all this in R too but this should be enough for now!