Natasha

Dissertation data analysis

This document will provide you with guidance on how to run

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

Robust statistical methods

Field, A. P., & Wilcox, R. R. (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour research and therapy, 98, 19-38.

You will need the following packages:

library(readxl)
library(tidyverse)
library(janitor)

Read in your data

Set your working directory and read in your data from each tab on your excel sheet:

setwd("~/Library/CloudStorage/OneDrive-TeessideUniversity/Work/Teaching/Dissertations/Natasha")

cmj<- read_excel(" DataforNatasha.xlsx",  sheet = "CMJ")
hip<- read_excel(" DataforNatasha.xlsx",  sheet = "Ab_ad")
ham<- read_excel(" DataforNatasha.xlsx",  sheet = "Iso30")
cycle<- read_excel(" DataforNatasha.xlsx",  sheet = "Cycle")
RJ<- read_excel(" DataforNatasha.xlsx",  sheet = "10_5RJ")

Clean up and transform

Here you need to sort the row names out to make them easy to use in R, the code below uses the janitor packages to create “clean names”, I have then taken the ling names and made them simplier e.g. bw_kg is changed to bw and jump_height_imp_mom_cm is changed to cmj_ht.

CMJ data

names<-names(cmj%>%clean_names)
colnames(cmj) <- names

cmj <- cmj  %>%
  mutate(
    Date = as.Date(cmj$date,format = "%d/%m/%Y")) %>%  subset(., select = c(id, Date, bw_kg, jump_height_imp_mom_cm))%>%
  rename(bw = bw_kg, cmj_ht = jump_height_imp_mom_cm)

head(cmj)

# A tibble: 6 × 4
  id     Date          bw cmj_ht
  <chr>  <date>     <dbl>  <dbl>
1 FAW 01 2024-01-29  75.6   27  
2 FAW 01 2023-12-11  74.4   28  
3 FAW 01 2023-12-04  73.9   28.1
4 FAW 01 2023-11-27  74.9   28  
5 FAW 01 2023-11-20  75.1   27.8
6 FAW 01 2023-11-06  75.3   28.9

RSI (jump ht / contact time) in m/s

names<-names(RJ%>%clean_names)
colnames(RJ) <- names

RJ <- RJ  %>%
  mutate(
    Date = as.Date(RJ$date,format = "%d/%m/%Y")) %>%  subset(., select = c(id, Date, mean_rsi_jump_height_contact_time_m_s))%>%
  rename( rsi = mean_rsi_jump_height_contact_time_m_s)

head(RJ)

# A tibble: 6 × 3
  id     Date         rsi
  <chr>  <date>     <dbl>
1 FAW 09 2024-01-29  0.78
2 FAW 07 2024-01-29  1.9 
3 FAW 01 2024-01-29  1.29
4 FAW 09 2023-12-11  0.77
5 FAW 07 2023-12-11  1.47
6 FAW 01 2023-12-11  1.28

Clean up and transform hip data

Hip data is more difficult as we have left and right force for both the Pull and the Squeeze I have had to transform this into a “wide” format here and then taken the max of each leg for both pull and squeeze. I have called this hip_max but I have also left the left and right leg data in as a dataframe called hip if you want that data.

names<-names(hip%>%clean_names)
colnames(hip) <- names

hip <- hip  %>%
  mutate(
    Date = as.Date(hip$date,format = "%d/%m/%Y")) %>%  subset(., select = c(id, Date, direction, r_max_force_n, l_max_force_n))%>%
  rename(r_max = r_max_force_n, l_max = l_max_force_n)

hip_max <- hip %>%
  mutate(
    Hip_max = pmax(r_max , l_max)
  ) %>%
  select(id, Date, direction, Hip_max)

hip = hip%>%
  pivot_wider(names_from = direction, values_from = c(4:5))


hip_max = hip_max %>%
  pivot_wider(names_from = direction, values_from = Hip_max)

head(hip_max)

# A tibble: 6 × 4
  id     Date        Pull Squeeze
  <chr>  <date>     <dbl>   <dbl>
1 FAW 01 2024-01-29  329.    312.
2 FAW 05 2024-01-29  454.    374.
3 FAW 07 2024-01-29  280     257.
4 FAW 09 2024-01-29  359.    319.
5 FAW 01 2023-12-11  365.    379 
6 FAW 09 2023-12-11  324.    330.

Nord board (ISO 30)

names<-names(ham%>%clean_names)
colnames(ham) <- names

ham<- ham  %>%
  mutate(
    Date = as.Date(ham$date,format = "%d/%m/%Y")) %>%  subset(., select = c(id, Date, r_max_force_n, l_max_force_n))%>%
  rename(r_ham = r_max_force_n, l_ham = l_max_force_n)


ham_max <- ham %>%
  mutate(
    Ham_max = pmax(r_ham , l_ham)
  ) %>%
  select(id, Date, Ham_max)

head(ham_max)

# A tibble: 6 × 3
  id     Date       Ham_max
  <chr>  <date>       <dbl>
1 FAW 07 2024-01-29    230.
2 FAW 01 2024-01-29    255.
3 FAW 05 2024-01-29    226.
4 FAW 09 2023-12-11    288.
5 FAW 05 2023-12-11    266 
6 FAW 07 2023-12-11    240

Cycle Data

names<-names(cycle%>%clean_names)
colnames(cycle) <- names

cycle<- cycle  %>%
  mutate(
    Date = as.Date(cycle$date,format = "%d/%m/%Y")) %>%  subset(., select = c(id, Date, phase))

head(cycle)

# A tibble: 6 × 3
  id     Date       phase
  <chr>  <date>     <chr>
1 FAW 10 2023-11-20 ML   
2 FAW 10 2023-12-04 EF   
3 FAW 10 2023-12-11 <NA> 
4 FAW 09 2023-10-16 EF   
5 FAW 09 2023-10-23 EF   
6 FAW 09 2024-01-09 LF

 data <- full_join(cycle, RJ, by = c("id", "Date"))%>% 
      left_join(., cmj, 
                by = c("id", "Date"))%>% 
      left_join(., hip_max, 
                by = c("id", "Date"))%>% 
      left_join(., ham_max, 
                by = c("id", "Date"))
 
 
data <- data %>%
    filter(id != "FAW 01")

Graphing your data

Below is a simple box plot with jittered points, you might want to change the color schemes (I like vidris personally but there are lots of good options - you can use AI (Chat GTP to ask for ideas if you want to make the plot look really nice).

If you want to combine several plots into 1 figure you can use a package called ggpubr library(ggpubr) and then use the code ggarrange()

To do this name your box plots e.g. a<- data %>% filter(!is.na(phase)) %>% ggplot(aes(x= phase, y= rsi, fill = phase))

etc….

data %>%
  filter(!is.na(phase)) %>%
  ggplot(aes(x= phase, y= rsi, fill = phase))+
  geom_boxplot()  +
  geom_jitter() +
  labs(y = "CMJ Height (cm)") +
  theme_classic() + theme(legend.position = "none")

Statistical analysis

I have included code below for a normal “mixed linear model” and a robust mixed linear model. In a nutshell robust methods are less influenced by outlines and thus better in your situation (probably generally better to use these according to Field and Wilcox see link and reference:

Robust statistical methods

Field, A. P., & Wilcox, R. R. (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour research and therapy, 98, 19-38.

Standard mixed linear model

library(lme4) #for linear models
library(emmeans) #for estimated marginal means (i.e., differnces between phases, p-valyes and confidence intervals)

lm_cmj<- lmer(cmj_ht ~ 1 + phase + (1|id), data = data)
summary(lm_cmj)

Linear mixed model fit by REML ['lmerMod']
Formula: cmj_ht ~ 1 + phase + (1 | id)
   Data: data

REML criterion at convergence: 140.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.02149 -0.34897  0.00323  0.69535  1.91308 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 8.159    2.856   
 Residual             2.691    1.640   
Number of obs: 35, groups:  id, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)  22.0362     1.2668  17.396
phaseLF       2.1416     0.8622   2.484
phaseML       0.3020     0.8412   0.359
phaseO        1.1310     0.7927   1.427

Correlation of Fixed Effects:
        (Intr) phasLF phasML
phaseLF -0.215              
phaseML -0.229  0.297       
phaseO  -0.220  0.387  0.316

emm_cmj <- emmeans(lm_cmj, pairwise ~ phase)
emm_cmj

$emmeans
 phase emmean   SE   df lower.CL upper.CL
 EF      22.0 1.27 5.90     18.9     25.2
 LF      24.2 1.37 7.96     21.0     27.3
 ML      22.3 1.35 7.61     19.2     25.5
 O       23.2 1.34 7.31     20.0     26.3

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast estimate    SE   df t.ratio p.value
 EF - LF    -2.142 0.868 26.7  -2.468  0.0884
 EF - ML    -0.302 0.845 26.5  -0.357  0.9840
 EF - O     -1.131 0.797 26.6  -1.420  0.4986
 LF - ML     1.840 1.015 26.5   1.812  0.2901
 LF - O      1.011 0.918 26.0   1.101  0.6923
 ML - O     -0.829 0.960 26.5  -0.863  0.8236

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 4 estimates

confint(emm_cmj, level = 0.95)

$emmeans
 phase emmean   SE   df lower.CL upper.CL
 EF      22.0 1.27 5.90     18.9     25.2
 LF      24.2 1.37 7.96     21.0     27.3
 ML      22.3 1.35 7.61     19.2     25.5
 O       23.2 1.34 7.31     20.0     26.3

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast estimate    SE   df lower.CL upper.CL
 EF - LF    -2.142 0.868 26.7   -4.518    0.235
 EF - ML    -0.302 0.845 26.5   -2.617    2.013
 EF - O     -1.131 0.797 26.6   -3.313    1.051
 LF - ML     1.840 1.015 26.5   -0.942    4.621
 LF - O      1.011 0.918 26.0   -1.508    3.529
 ML - O     -0.829 0.960 26.5   -3.461    1.803

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates

Robust mixed linear model

library(robustlmm) #for robust linear models
library(emmeans) #for estimated marginal means (i.e., differnces between phases, p-valyes and confidence intervals)

robust_cmj<- rlmer(cmj_ht ~ 1 + phase + (1|id), data = data)
residuals <- residuals(robust_cmj)

# Perform normality tests (e.g., Shapiro-Wilk test)
shapiro.test(residuals)


    Shapiro-Wilk normality test

data:  residuals
W = 0.95485, p-value = 0.1596

# Alternatively, you can create a QQ plot
qqnorm(residuals)
qqline(residuals)

summary(robust_cmj)

Robust linear mixed model fit by DAStau 
Formula: cmj_ht ~ 1 + phase + (1 | id) 
   Data: data 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.41668 -0.36684 -0.06528  0.68773  2.30453 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 10.196   3.193   
 Residual              2.191   1.480   
Number of obs: 35, groups: id, 6

Fixed effects:
            Estimate Std. Error t value
(Intercept)  22.0965     1.4139  15.628
phaseLF       2.1078     0.7997   2.636
phaseML       0.2947     0.7797   0.378
phaseO        1.2197     0.7349   1.660

Correlation of Fixed Effects:
        (Intr) phasLF phasML
phaseLF -0.179              
phaseML -0.191  0.297       
phaseO  -0.183  0.390  0.316

Robustness weights for the residuals: 
 30 weights are ~= 1. The remaining 5 ones are
    6    14    25    26    30 
0.923 0.710 0.584 0.584 0.556 

Robustness weights for the random effects: 
[1] 1.00 1.00 1.00 1.00 1.00 0.91

Rho functions used for fitting:
  Residuals:
    eff: smoothed Huber (k = 1.345, s = 10) 
    sig: smoothed Huber, Proposal 2 (k = 1.345, s = 10) 
  Random Effects, variance component 1 (id):
    eff: smoothed Huber (k = 1.345, s = 10) 
    vcp: smoothed Huber, Proposal 2 (k = 1.345, s = 10)

emm_robust_cmj <- emmeans(robust_cmj, pairwise ~ phase)
emm_robust_cmj

$emmeans
 phase emmean   SE  df asymp.LCL asymp.UCL
 EF      22.1 1.41 Inf      19.3      24.9
 LF      24.2 1.49 Inf      21.3      27.1
 ML      22.4 1.48 Inf      19.5      25.3
 O       23.3 1.47 Inf      20.4      26.2

Confidence level used: 0.95 

$contrasts
 contrast estimate    SE  df z.ratio p.value
 EF - LF    -2.108 0.800 Inf  -2.636  0.0418
 EF - ML    -0.295 0.780 Inf  -0.378  0.9816
 EF - O     -1.220 0.735 Inf  -1.660  0.3452
 LF - ML     1.813 0.937 Inf   1.936  0.2131
 LF - O      0.888 0.850 Inf   1.045  0.7227
 ML - O     -0.925 0.887 Inf  -1.043  0.7239

P value adjustment: tukey method for comparing a family of 4 estimates

confint(emm_robust_cmj, level = 0.95)

$emmeans
 phase emmean   SE  df asymp.LCL asymp.UCL
 EF      22.1 1.41 Inf      19.3      24.9
 LF      24.2 1.49 Inf      21.3      27.1
 ML      22.4 1.48 Inf      19.5      25.3
 O       23.3 1.47 Inf      20.4      26.2

Confidence level used: 0.95 

$contrasts
 contrast estimate    SE  df asymp.LCL asymp.UCL
 EF - LF    -2.108 0.800 Inf    -4.162   -0.0533
 EF - ML    -0.295 0.780 Inf    -2.298    1.7084
 EF - O     -1.220 0.735 Inf    -3.108    0.6682
 LF - ML     1.813 0.937 Inf    -0.593    4.2194
 LF - O      0.888 0.850 Inf    -1.294    3.0706
 ML - O     -0.925 0.887 Inf    -3.203    1.3526

Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 4 estimates

You can see here there is a 2.1 cm difference in jump height (95% CI 0.05to 4.16 cm) which is statistically significant p = 0,418