Data lab worksheet

Load packages

Here we are going to take a look at some force frame data for a group of girls football players. All data has been collected under an annonymous ID as it was sent to the VALD hub and has subsequently been downloaded as a .csv file - this you have access to on blackboard.

Install and load relevant packates here: this code lists the names the packages you might need (pakages) highlights any that you don’t have (new.packages) and installs these and then loads them all for you.

packages <- c(  "ggplot2", "ggpubr" ,"readr" , "tidyverse", "Hmisc","plyr", "dplyr" ,"janitor", "readxl", "kableExtra")

new.packages <- packages[!(packages %in% installed.packages()[,"Package"])]

if(length(new.packages)) install.packages(new.packages)
lapply(packages, library, character.only = TRUE)
options(scipen = 9999)
set.seed(123)

Read data

Let’s now set our working directory and read in the data. I have a line of code here that takes the externa_id and changes this to a name based upon a separate excel file. To maintain confidentiality I have set this not to run but left the code there so you can see this is an option for you when working with team data downloaded from an external site like VALD hub.

I have used a clever bit of code from the janitor packages that changes the header names into ones you can use when coding:

names<-names(hip%>%clean_names)

colnames(hip) <- names

setwd("~/Library/CloudStorage/OneDrive-TeessideUniversity/Work/RTC/RTC_ETC_2022_2023/Fitness testing/Strength")

hip <- read_csv("forceframe-test-export-19_01_2023.csv")

names<-names(hip%>%clean_names)
colnames(hip) <- names
hip<-subset(hip, select = -c(name,time_utc, device,  mode, test, notes))
#names(hip)[names(hip) == "external_id"] <- "ID"
#hip <- merge(hip, ID, by = "ID")
hip<-hip[hip$l_reps != 0, ] #Thus removes rows with zeros
head(hip)
## # A tibble: 6 × 19
##   extern…¹ date_…² direc…³ posit…⁴ l_reps r_reps l_max…⁵ r_max…⁶ max_i…⁷ l_max…⁸
##   <chr>    <chr>   <chr>   <chr>    <dbl>  <dbl>   <dbl>   <dbl>   <dbl> <chr>  
## 1 P27      09/01/… Pull    Hip AD…      3      3    241.    259.    6.81 1.04   
## 2 P27      09/01/… Squeeze Hip AD…      3      3    252.    261.    3.64 1.04   
## 3 P35      09/01/… Pull    Hip AD…      3      3    238.    205   -13.7  1.14   
## 4 P35      09/01/… Squeeze Hip AD…      3      3    271.    215   -20.7  1.14   
## 5 P11      09/01/… Pull    Hip AD…      3      3    278     269.   -3.15 0.83   
## 6 P11      09/01/… Squeeze Hip AD…      3      3    232.    249     7.03 0.83   
## # … with 9 more variables: r_max_ratio <chr>, l_avg_force_n <dbl>,
## #   r_avg_force_n <dbl>, avg_imbalance <dbl>, l_avg_ratio <chr>,
## #   r_avg_ratio <chr>, l_max_impulse_ns <dbl>, r_max_impulse_ns <dbl>,
## #   impulse_imbalance_percent <dbl>, and abbreviated variable names
## #   ¹​external_id, ²​date_utc, ³​direction, ⁴​position, ⁵​l_max_force_n,
## #   ⁶​r_max_force_n, ⁷​max_imbalance, ⁸​l_max_ratio

Let’s cut this down a little to just include some key variables left and right leg max force. we’ll also keep the max ratios in (abduction - adduction and the left-right imbalance for now.

## # A tibble: 6 × 7
##   external_id direction l_max_force_n r_max_force_n max_imbala…¹ l_max…² r_max…³
##   <chr>       <chr>             <dbl>         <dbl>        <dbl> <chr>   <chr>  
## 1 P27         Pull               241.          259.         6.81 1.04    1.01   
## 2 P27         Squeeze            252.          261.         3.64 1.04    1.01   
## 3 P35         Pull               238.          205        -13.7  1.14    1.05   
## 4 P35         Squeeze            271.          215        -20.7  1.14    1.05   
## 5 P11         Pull               278           269.        -3.15 0.83    0.92   
## 6 P11         Squeeze            232.          249          7.03 0.83    0.92   
## # … with abbreviated variable names ¹​max_imbalance, ²​l_max_ratio, ³​r_max_ratio

Now let’s do some simple visualizations

The following code will graph the squad data.

ggplot(hip, aes(x = reorder(external_id, l_max_force_n), y = l_max_force_n, fill = external_id)) +
geom_col() +
scale_fill_viridis_d(option = "viridis", direction = 1) +
coord_flip() +
theme_classic() +
theme(legend.position = "none")+
  facet_wrap(vars(direction))

ggplot(hip, aes(x = reorder(external_id, l_max_force_n), y = l_max_force_n, fill = external_id)) +
geom_col() +
scale_fill_viridis_d(option = "viridis", direction = 1) +
coord_flip() +
theme_classic() +
theme(legend.position = "none")+
  facet_wrap(vars(direction))

ggplot(hip, aes(x = reorder(external_id, max_imbalance), y = max_imbalance, fill = external_id)) +
geom_col() +
scale_fill_viridis_d(option = "viridis", direction = 1) +
coord_flip() +
theme_classic() +
theme(legend.position = "none")+
  facet_wrap(vars(direction))

Ranking data

This is some code I borrowed from Patrick Ward’s blog and it is a nice way to rank squad data and create z and t-scores. and might be useful if you are working with team data

z_score <- function(x, avg, SD){
  z = (x - avg) / SD
  return(z)
}

t_score <- function(z){
  t = z * 10 + 50
  return(t)
}

perc.rank <- function(x){
  trunc(rank(x))/length(x)
}

rank_pull <- hip %>%
  filter(direction == "Pull")

left_rank_pull <- rank_pull %>%
  mutate(percentile_rank_left = perc.rank(l_max_force_n))


left_rank_pull %>% 
  arrange(desc(percentile_rank_left)) %>%
  knitr::kable()
external_id direction l_max_force_n r_max_force_n max_imbalance l_max_ratio r_max_ratio percentile_rank_left
P03 Pull 361.125 332.500 -7.9266182 0.98 1.05 1.0000000
P18 Pull 354.500 348.250 -1.7630465 0.87 0.81 0.9677419
P54 Pull 327.750 344.750 4.9311095 0.85 0.83 0.9354839
P26 Pull 304.750 313.000 2.6357827 1.06 1 0.9032258
P16 Pull 299.000 310.750 3.7811746 1.21 1.15 0.8709677
P01 Pull 292.250 318.500 8.2417582 1.13 1.02 0.8387097
P29 Pull 288.000 288.000 0.0000000 1.2 1.18 0.8064516
P13 Pull 282.500 283.875 0.4843681 0.94 0.96 0.7741935
P23 Pull 282.000 273.000 -3.1914894 1.07 1.03 0.7419355
P11 Pull 278.000 269.250 -3.1474820 0.83 0.92 0.7096774
P55 Pull 258.375 288.250 10.3642671 1.09 0.96 0.6774194
P31 Pull 256.750 264.750 3.0217186 1.35 1.25 0.6451613
P46 Pull 256.250 249.875 -2.4878049 1.03 1.02 0.6129032
P30 Pull 255.000 272.000 6.2500000 1.06 1.02 0.5806452
P10 Pull 254.750 275.250 7.4477747 1.06 1 0.5483871
P41 Pull 254.250 285.375 10.9067017 1.38 1.31 0.5161290
P17 Pull 252.000 309.500 18.5783522 1.48 1.27 0.4838710
P50 Pull 251.000 245.500 -2.1912351 0.99 1.14 0.4516129
P08 Pull 246.625 242.750 -1.5712114 0.99 1 0.4193548
P27 Pull 241.250 258.875 6.8083052 1.04 1.01 0.3870968
P14 Pull 239.500 259.500 7.7071291 1.21 1.1 0.3548387
P35 Pull 237.625 205.000 -13.7296160 1.14 1.05 0.3225806
P53 Pull 235.750 230.500 -2.2269353 1.08 1.17 0.2903226
P09 Pull 224.750 216.500 -3.6707453 1.18 1.38 0.2580645
P57 Pull 224.000 268.750 16.6511628 1.32 0.92 0.2258065
P24 Pull 223.500 209.000 -6.4876957 0.96 1.21 0.1935484
P32 Pull 220.750 282.250 21.7891940 1.47 1.12 0.1612903
P47 Pull 211.500 214.750 1.5133877 1.24 1.2 0.1290323
P28 Pull 205.875 212.000 2.8891509 1.06 1.07 0.0967742
P42 Pull 201.625 212.750 5.2291422 1.35 1.26 0.0645161
P58 Pull 153.250 145.875 -4.8123980 1.3 1.43 0.0322581

You can then plot your ranked data:

ggplot(left_rank_pull, aes(x = reorder(external_id, l_max_force_n), y = l_max_force_n, fill = external_id)) +
geom_col() +
scale_fill_viridis_d(option = "viridis", direction = 1) +
coord_flip() +
theme_classic() +
theme(legend.position = "none")

Ratio scaling

Lets first explore the abduction-adduction ratio, the Vald software has already derived our max ratio (squeeze [abduction] / pull [adduction]) although we will need to explore the relationship between each part of the ratio.

Here we will need to reformat the data set so we can graph squeeze and pull strength. There should be a relationship between these variables, otherwise there would be no rationale for normailsing squeeze strength by using pull strength.

To do this we need the data in wide format so we have squeeze and pull in separate columns, here we are going to subset the data so we only have the columns we need and then we will use the pivot_wider() function.

# Select the columns you want to pivot
df_wide <- subset(hip, select=c(external_id, direction, l_max_force_n, r_max_force_n))

# Pivot the data wider
df_wide <- pivot_wider(data = df_wide, names_from = direction, values_from = c(l_max_force_n, r_max_force_n))

# Print the resulting dataframe
head(df_wide)
## # A tibble: 6 × 5
##   external_id l_max_force_n_Pull l_max_force_n_Squeeze r_max_force_n_P…¹ r_max…²
##   <chr>                    <dbl>                 <dbl>             <dbl>   <dbl>
## 1 P27                       241.                  252.              259.    261.
## 2 P35                       238.                  271.              205     215 
## 3 P11                       278                   232.              269.    249 
## 4 P41                       254.                  351               285.    373.
## 5 P58                       153.                  199               146.    208.
## 6 P57                       224                   296.              269.    247 
## # … with abbreviated variable names ¹​r_max_force_n_Pull, ²​r_max_force_n_Squeeze

No we have pull and squeeze force for each player for left and right leg, all in separate columns. This mean’s we can graph the linear relationship and perform a simple correlation. Here is the code for the left leg:

ggscatter(df_wide, x = "l_max_force_n_Pull", y = "l_max_force_n_Squeeze") + 
  geom_point(colour = "blue") +
  geom_smooth(method=lm, se = TRUE)

cor.test(df_wide$l_max_force_n_Pull,df_wide$l_max_force_n_Squeeze)
## 
##  Pearson's product-moment correlation
## 
## data:  df_wide$l_max_force_n_Pull and df_wide$l_max_force_n_Squeeze
## t = 3.5266, df = 29, p-value = 0.001422
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2401186 0.7555214
## sample estimates:
##      cor 
## 0.547846

Good news! These are related r = 0.55 95%CI .24 to 0.76 - great lets look at the ratio a bit further. NOTE: You may wish to do the same for the right leg.

We also need to filter the data to include just the denominator (pull force) and ensure the ratio is a numeric as it defaults to a factor. We can then graph the relationship between the denominator and the ratio.

If the ratio scales correctly there should be no relationship between the two.

ratio<- hip %>% filter(direction %in% "Pull") 

ratio$l_max_ratio = as.numeric(ratio$l_max_ratio)

ggscatter(ratio, x = "l_max_force_n", y = "l_max_ratio") + 
  geom_point(colour = "blue") +
  geom_smooth(method=lm, se = TRUE)

cor.test(ratio$l_max_force_n,ratio$l_max_ratio)
## 
##  Pearson's product-moment correlation
## 
## data:  ratio$l_max_force_n and ratio$l_max_ratio
## t = -3.1677, df = 29, p-value = 0.003603
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7301747 -0.1861102
## sample estimates:
##        cor 
## -0.5070146

For the left limb we have a clear relationship where the stronger players have a lower ratio r = -0.51, 95% CO -0.73 to -0.19. So be cautious - if you present the ratio with considering its component parts you could be missing something. You could repeat this for the right limb and see if we get the same, or similar a result.

You could use gg arrange to put left versus right graphs together and/ or think about adding axis titles, changing colour schemes or even adding the correlation results onto the plots.

Task

Can you do the same thing for the asymmetry ratio? This will require a little more data wrangling but you should all now be experts in R!

Useful reading

Lolli, L., Batterham, A.M., Hawkins, R., Kelly, D.M., Strudwick, A.J., Thorpe, R.T., Gregson, W. and Atkinson, G., 2019. The acute-to-chronic workload ratio: an inaccurate scaling index for an unnecessary normalisation process?. British Journal of Sports Medicine, 53(24), pp.1510-1512.

Lolli, L., Batterham, A.M., Kratochvíl, L., Flegr, J., Weston, K.L. and Atkinson, G., 2017. A comprehensive allometric analysis of 2nd digit length to 4th digit length in humans. Proceedings of the Royal Society B: Biological Sciences, 284(1857), p.20170356.