Data lab worksheet

Load packages

Here we are going to take a look at some force frame data for a group of girls football players. All data has been collected under an annonymous ID as it was sent to the VALD hub and has subsequently been downloaded as a .csv file - this you have access to on blackboard.

Install and load relevant packates here: this code lists the names the packages you might need (pakages) highlights any that you don’t have (new.packages) and installs these and then loads them all for you.

packages <- c(  "ggplot2", "ggpubr" ,"readr" , "tidyverse", "Hmisc","plyr", "dplyr" ,"janitor", "readxl", "kableExtra")

new.packages <- packages[!(packages %in% installed.packages()[,"Package"])]

if(length(new.packages)) install.packages(new.packages)
lapply(packages, library, character.only = TRUE)
options(scipen = 9999)
set.seed(123)

Read data

Let’s now set our working directory and read in the data. I have a line of code here that takes the externa_id and changes this to a name based upon a separate excel file. To maintain confidentiality I have set this not to run but left the code there so you can see this is an option for you when working with team data downloaded from an external site like VALD hub.

I have used a clever bit of code from the janitor packages that changes the header names into ones you can use when coding:

names<-names(hip%>%clean_names)

colnames(hip) <- names

setwd("~/Library/CloudStorage/OneDrive-TeessideUniversity/Work/RTC/RTC_ETC_2022_2023/Fitness testing/Strength")

hip <- read_csv("forceframe-test-export-19_01_2023.csv")

names<-names(hip%>%clean_names)
colnames(hip) <- names
hip<-subset(hip, select = -c(name,time_utc, device,  mode, test, notes))
#names(hip)[names(hip) == "external_id"] <- "ID"
#hip <- merge(hip, ID, by = "ID")
hip<-hip[hip$l_reps != 0, ] #Thus removes rows with zeros
head(hip)

## # A tibble: 6 × 19
##   extern…¹ date_…² direc…³ posit…⁴ l_reps r_reps l_max…⁵ r_max…⁶ max_i…⁷ l_max…⁸
##   <chr>    <chr>   <chr>   <chr>    <dbl>  <dbl>   <dbl>   <dbl>   <dbl> <chr>  
## 1 P27      09/01/… Pull    Hip AD…      3      3    241.    259.    6.81 1.04   
## 2 P27      09/01/… Squeeze Hip AD…      3      3    252.    261.    3.64 1.04   
## 3 P35      09/01/… Pull    Hip AD…      3      3    238.    205   -13.7  1.14   
## 4 P35      09/01/… Squeeze Hip AD…      3      3    271.    215   -20.7  1.14   
## 5 P11      09/01/… Pull    Hip AD…      3      3    278     269.   -3.15 0.83   
## 6 P11      09/01/… Squeeze Hip AD…      3      3    232.    249     7.03 0.83   
## # … with 9 more variables: r_max_ratio <chr>, l_avg_force_n <dbl>,
## #   r_avg_force_n <dbl>, avg_imbalance <dbl>, l_avg_ratio <chr>,
## #   r_avg_ratio <chr>, l_max_impulse_ns <dbl>, r_max_impulse_ns <dbl>,
## #   impulse_imbalance_percent <dbl>, and abbreviated variable names
## #   ¹external_id, ²date_utc, ³direction, ⁴position, ⁵l_max_force_n,
## #   ⁶r_max_force_n, ⁷max_imbalance, ⁸l_max_ratio

Let’s cut this down a little to just include some key variables left and right leg max force. we’ll also keep the max ratios in (abduction - adduction and the left-right imbalance for now.

## # A tibble: 6 × 7
##   external_id direction l_max_force_n r_max_force_n max_imbala…¹ l_max…² r_max…³
##   <chr>       <chr>             <dbl>         <dbl>        <dbl> <chr>   <chr>  
## 1 P27         Pull               241.          259.         6.81 1.04    1.01   
## 2 P27         Squeeze            252.          261.         3.64 1.04    1.01   
## 3 P35         Pull               238.          205        -13.7  1.14    1.05   
## 4 P35         Squeeze            271.          215        -20.7  1.14    1.05   
## 5 P11         Pull               278           269.        -3.15 0.83    0.92   
## 6 P11         Squeeze            232.          249          7.03 0.83    0.92   
## # … with abbreviated variable names ¹max_imbalance, ²l_max_ratio, ³r_max_ratio

Now let’s do some simple visualizations

The following code will graph the squad data.

ggplot(hip, aes(x = reorder(external_id, l_max_force_n), y = l_max_force_n, fill = external_id)) +
geom_col() +
scale_fill_viridis_d(option = "viridis", direction = 1) +
coord_flip() +
theme_classic() +
theme(legend.position = "none")+
  facet_wrap(vars(direction))

ggplot(hip, aes(x = reorder(external_id, l_max_force_n), y = l_max_force_n, fill = external_id)) +
geom_col() +
scale_fill_viridis_d(option = "viridis", direction = 1) +
coord_flip() +
theme_classic() +
theme(legend.position = "none")+
  facet_wrap(vars(direction))

ggplot(hip, aes(x = reorder(external_id, max_imbalance), y = max_imbalance, fill = external_id)) +
geom_col() +
scale_fill_viridis_d(option = "viridis", direction = 1) +
coord_flip() +
theme_classic() +
theme(legend.position = "none")+
  facet_wrap(vars(direction))

Ranking data

This is some code I borrowed from Patrick Ward’s blog and it is a nice way to rank squad data and create z and t-scores. and might be useful if you are working with team data

z_score <- function(x, avg, SD){
  z = (x - avg) / SD
  return(z)
}

t_score <- function(z){
  t = z * 10 + 50
  return(t)
}

perc.rank <- function(x){
  trunc(rank(x))/length(x)
}

rank_pull <- hip %>%
  filter(direction == "Pull")

left_rank_pull <- rank_pull %>%
  mutate(percentile_rank_left = perc.rank(l_max_force_n))


left_rank_pull %>% 
  arrange(desc(percentile_rank_left)) %>%
  knitr::kable()

external_id	direction	l_max_force_n	r_max_force_n	max_imbalance	l_max_ratio	r_max_ratio	percentile_rank_left
P03	Pull	361.125	332.500	-7.9266182	0.98	1.05	1.0000000
P18	Pull	354.500	348.250	-1.7630465	0.87	0.81	0.9677419
P54	Pull	327.750	344.750	4.9311095	0.85	0.83	0.9354839
P26	Pull	304.750	313.000	2.6357827	1.06	1	0.9032258
P16	Pull	299.000	310.750	3.7811746	1.21	1.15	0.8709677
P01	Pull	292.250	318.500	8.2417582	1.13	1.02	0.8387097
P29	Pull	288.000	288.000	0.0000000	1.2	1.18	0.8064516
P13	Pull	282.500	283.875	0.4843681	0.94	0.96	0.7741935
P23	Pull	282.000	273.000	-3.1914894	1.07	1.03	0.7419355
P11	Pull	278.000	269.250	-3.1474820	0.83	0.92	0.7096774
P55	Pull	258.375	288.250	10.3642671	1.09	0.96	0.6774194
P31	Pull	256.750	264.750	3.0217186	1.35	1.25	0.6451613
P46	Pull	256.250	249.875	-2.4878049	1.03	1.02	0.6129032
P30	Pull	255.000	272.000	6.2500000	1.06	1.02	0.5806452
P10	Pull	254.750	275.250	7.4477747	1.06	1	0.5483871
P41	Pull	254.250	285.375	10.9067017	1.38	1.31	0.5161290
P17	Pull	252.000	309.500	18.5783522	1.48	1.27	0.4838710
P50	Pull	251.000	245.500	-2.1912351	0.99	1.14	0.4516129
P08	Pull	246.625	242.750	-1.5712114	0.99	1	0.4193548
P27	Pull	241.250	258.875	6.8083052	1.04	1.01	0.3870968
P14	Pull	239.500	259.500	7.7071291	1.21	1.1	0.3548387
P35	Pull	237.625	205.000	-13.7296160	1.14	1.05	0.3225806
P53	Pull	235.750	230.500	-2.2269353	1.08	1.17	0.2903226
P09	Pull	224.750	216.500	-3.6707453	1.18	1.38	0.2580645
P57	Pull	224.000	268.750	16.6511628	1.32	0.92	0.2258065
P24	Pull	223.500	209.000	-6.4876957	0.96	1.21	0.1935484
P32	Pull	220.750	282.250	21.7891940	1.47	1.12	0.1612903
P47	Pull	211.500	214.750	1.5133877	1.24	1.2	0.1290323
P28	Pull	205.875	212.000	2.8891509	1.06	1.07	0.0967742
P42	Pull	201.625	212.750	5.2291422	1.35	1.26	0.0645161
P58	Pull	153.250	145.875	-4.8123980	1.3	1.43	0.0322581

You can then plot your ranked data:

ggplot(left_rank_pull, aes(x = reorder(external_id, l_max_force_n), y = l_max_force_n, fill = external_id)) +
geom_col() +
scale_fill_viridis_d(option = "viridis", direction = 1) +
coord_flip() +
theme_classic() +
theme(legend.position = "none")

Ratio scaling

Lets first explore the abduction-adduction ratio, the Vald software has already derived our max ratio (squeeze [abduction] / pull [adduction]) although we will need to explore the relationship between each part of the ratio.

Here we will need to reformat the data set so we can graph squeeze and pull strength. There should be a relationship between these variables, otherwise there would be no rationale for normailsing squeeze strength by using pull strength.

To do this we need the data in wide format so we have squeeze and pull in separate columns, here we are going to subset the data so we only have the columns we need and then we will use the pivot_wider() function.

# Select the columns you want to pivot
df_wide <- subset(hip, select=c(external_id, direction, l_max_force_n, r_max_force_n))

# Pivot the data wider
df_wide <- pivot_wider(data = df_wide, names_from = direction, values_from = c(l_max_force_n, r_max_force_n))

# Print the resulting dataframe
head(df_wide)

## # A tibble: 6 × 5
##   external_id l_max_force_n_Pull l_max_force_n_Squeeze r_max_force_n_P…¹ r_max…²
##   <chr>                    <dbl>                 <dbl>             <dbl>   <dbl>
## 1 P27                       241.                  252.              259.    261.
## 2 P35                       238.                  271.              205     215 
## 3 P11                       278                   232.              269.    249 
## 4 P41                       254.                  351               285.    373.
## 5 P58                       153.                  199               146.    208.
## 6 P57                       224                   296.              269.    247 
## # … with abbreviated variable names ¹r_max_force_n_Pull, ²r_max_force_n_Squeeze

No we have pull and squeeze force for each player for left and right leg, all in separate columns. This mean’s we can graph the linear relationship and perform a simple correlation. Here is the code for the left leg:

ggscatter(df_wide, x = "l_max_force_n_Pull", y = "l_max_force_n_Squeeze") + 
  geom_point(colour = "blue") +
  geom_smooth(method=lm, se = TRUE)

cor.test(df_wide$l_max_force_n_Pull,df_wide$l_max_force_n_Squeeze)

## 
##  Pearson's product-moment correlation
## 
## data:  df_wide$l_max_force_n_Pull and df_wide$l_max_force_n_Squeeze
## t = 3.5266, df = 29, p-value = 0.001422
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2401186 0.7555214
## sample estimates:
##      cor 
## 0.547846

Good news! These are related r = 0.55 95%CI .24 to 0.76 - great lets look at the ratio a bit further. NOTE: You may wish to do the same for the right leg.

We also need to filter the data to include just the denominator (pull force) and ensure the ratio is a numeric as it defaults to a factor. We can then graph the relationship between the denominator and the ratio.

If the ratio scales correctly there should be no relationship between the two.

ratio<- hip %>% filter(direction %in% "Pull") 

ratio$l_max_ratio = as.numeric(ratio$l_max_ratio)

ggscatter(ratio, x = "l_max_force_n", y = "l_max_ratio") + 
  geom_point(colour = "blue") +
  geom_smooth(method=lm, se = TRUE)

cor.test(ratio$l_max_force_n,ratio$l_max_ratio)

## 
##  Pearson's product-moment correlation
## 
## data:  ratio$l_max_force_n and ratio$l_max_ratio
## t = -3.1677, df = 29, p-value = 0.003603
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7301747 -0.1861102
## sample estimates:
##        cor 
## -0.5070146

For the left limb we have a clear relationship where the stronger players have a lower ratio r = -0.51, 95% CO -0.73 to -0.19. So be cautious - if you present the ratio with considering its component parts you could be missing something. You could repeat this for the right limb and see if we get the same, or similar a result.

You could use gg arrange to put left versus right graphs together and/ or think about adding axis titles, changing colour schemes or even adding the correlation results onto the plots.

Task

Can you do the same thing for the asymmetry ratio? This will require a little more data wrangling but you should all now be experts in R!

Useful reading

Lolli, L., Batterham, A.M., Hawkins, R., Kelly, D.M., Strudwick, A.J., Thorpe, R.T., Gregson, W. and Atkinson, G., 2019. The acute-to-chronic workload ratio: an inaccurate scaling index for an unnecessary normalisation process?. British Journal of Sports Medicine, 53(24), pp.1510-1512.

Lolli, L., Batterham, A.M., Kratochvíl, L., Flegr, J., Weston, K.L. and Atkinson, G., 2017. A comprehensive allometric analysis of 2nd digit length to 4th digit length in humans. Proceedings of the Royal Society B: Biological Sciences, 284(1857), p.20170356.

Exploring the ratio between abduction and adduction strength

MSc S&C for performance and rehabilitation

2023-01-19