library(readxl)
library(tidyverse)
library(janitor)Natasha
Dissertation data analysis
This document will provide you with guidance on how to run
Running Code
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
Field, A. P., & Wilcox, R. R. (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour research and therapy, 98, 19-38.
You will need the following packages:
Read in your data
Set your working directory and read in your data from each tab on your excel sheet:
setwd("~/Library/CloudStorage/OneDrive-TeessideUniversity/Work/Teaching/Dissertations/Natasha")
cmj<- read_excel(" DataforNatasha.xlsx", sheet = "CMJ")
hip<- read_excel(" DataforNatasha.xlsx", sheet = "Ab_ad")
ham<- read_excel(" DataforNatasha.xlsx", sheet = "Iso30")
cycle<- read_excel(" DataforNatasha.xlsx", sheet = "Cycle")
RJ<- read_excel(" DataforNatasha.xlsx", sheet = "10_5RJ")Clean up and transform
Here you need to sort the row names out to make them easy to use in R, the code below uses the janitor packages to create “clean names”, I have then taken the ling names and made them simplier e.g. bw_kg is changed to bw and jump_height_imp_mom_cm is changed to cmj_ht.
CMJ data
names<-names(cmj%>%clean_names)
colnames(cmj) <- names
cmj <- cmj %>%
mutate(
Date = as.Date(cmj$date,format = "%d/%m/%Y")) %>% subset(., select = c(id, Date, bw_kg, jump_height_imp_mom_cm))%>%
rename(bw = bw_kg, cmj_ht = jump_height_imp_mom_cm)
head(cmj)# A tibble: 6 × 4
id Date bw cmj_ht
<chr> <date> <dbl> <dbl>
1 FAW 01 2024-01-29 75.6 27
2 FAW 01 2023-12-11 74.4 28
3 FAW 01 2023-12-04 73.9 28.1
4 FAW 01 2023-11-27 74.9 28
5 FAW 01 2023-11-20 75.1 27.8
6 FAW 01 2023-11-06 75.3 28.9
RSI (jump ht / contact time) in m/s
names<-names(RJ%>%clean_names)
colnames(RJ) <- names
RJ <- RJ %>%
mutate(
Date = as.Date(RJ$date,format = "%d/%m/%Y")) %>% subset(., select = c(id, Date, mean_rsi_jump_height_contact_time_m_s))%>%
rename( rsi = mean_rsi_jump_height_contact_time_m_s)
head(RJ)# A tibble: 6 × 3
id Date rsi
<chr> <date> <dbl>
1 FAW 09 2024-01-29 0.78
2 FAW 07 2024-01-29 1.9
3 FAW 01 2024-01-29 1.29
4 FAW 09 2023-12-11 0.77
5 FAW 07 2023-12-11 1.47
6 FAW 01 2023-12-11 1.28
Clean up and transform hip data
Hip data is more difficult as we have left and right force for both the Pull and the Squeeze I have had to transform this into a “wide” format here and then taken the max of each leg for both pull and squeeze. I have called this hip_max but I have also left the left and right leg data in as a dataframe called hip if you want that data.
names<-names(hip%>%clean_names)
colnames(hip) <- names
hip <- hip %>%
mutate(
Date = as.Date(hip$date,format = "%d/%m/%Y")) %>% subset(., select = c(id, Date, direction, r_max_force_n, l_max_force_n))%>%
rename(r_max = r_max_force_n, l_max = l_max_force_n)
hip_max <- hip %>%
mutate(
Hip_max = pmax(r_max , l_max)
) %>%
select(id, Date, direction, Hip_max)
hip = hip%>%
pivot_wider(names_from = direction, values_from = c(4:5))
hip_max = hip_max %>%
pivot_wider(names_from = direction, values_from = Hip_max)
head(hip_max)# A tibble: 6 × 4
id Date Pull Squeeze
<chr> <date> <dbl> <dbl>
1 FAW 01 2024-01-29 329. 312.
2 FAW 05 2024-01-29 454. 374.
3 FAW 07 2024-01-29 280 257.
4 FAW 09 2024-01-29 359. 319.
5 FAW 01 2023-12-11 365. 379
6 FAW 09 2023-12-11 324. 330.
Nord board (ISO 30)
names<-names(ham%>%clean_names)
colnames(ham) <- names
ham<- ham %>%
mutate(
Date = as.Date(ham$date,format = "%d/%m/%Y")) %>% subset(., select = c(id, Date, r_max_force_n, l_max_force_n))%>%
rename(r_ham = r_max_force_n, l_ham = l_max_force_n)
ham_max <- ham %>%
mutate(
Ham_max = pmax(r_ham , l_ham)
) %>%
select(id, Date, Ham_max)
head(ham_max)# A tibble: 6 × 3
id Date Ham_max
<chr> <date> <dbl>
1 FAW 07 2024-01-29 230.
2 FAW 01 2024-01-29 255.
3 FAW 05 2024-01-29 226.
4 FAW 09 2023-12-11 288.
5 FAW 05 2023-12-11 266
6 FAW 07 2023-12-11 240
Cycle Data
names<-names(cycle%>%clean_names)
colnames(cycle) <- names
cycle<- cycle %>%
mutate(
Date = as.Date(cycle$date,format = "%d/%m/%Y")) %>% subset(., select = c(id, Date, phase))
head(cycle)# A tibble: 6 × 3
id Date phase
<chr> <date> <chr>
1 FAW 10 2023-11-20 ML
2 FAW 10 2023-12-04 EF
3 FAW 10 2023-12-11 <NA>
4 FAW 09 2023-10-16 EF
5 FAW 09 2023-10-23 EF
6 FAW 09 2024-01-09 LF
data <- full_join(cycle, RJ, by = c("id", "Date"))%>%
left_join(., cmj,
by = c("id", "Date"))%>%
left_join(., hip_max,
by = c("id", "Date"))%>%
left_join(., ham_max,
by = c("id", "Date"))
data <- data %>%
filter(id != "FAW 01")Graphing your data
Below is a simple box plot with jittered points, you might want to change the color schemes (I like vidris personally but there are lots of good options - you can use AI (Chat GTP to ask for ideas if you want to make the plot look really nice).
If you want to combine several plots into 1 figure you can use a package called ggpubr library(ggpubr) and then use the code ggarrange()
To do this name your box plots e.g. a<- data %>% filter(!is.na(phase)) %>% ggplot(aes(x= phase, y= rsi, fill = phase))
etc….
data %>%
filter(!is.na(phase)) %>%
ggplot(aes(x= phase, y= rsi, fill = phase))+
geom_boxplot() +
geom_jitter() +
labs(y = "CMJ Height (cm)") +
theme_classic() + theme(legend.position = "none")Statistical analysis
I have included code below for a normal “mixed linear model” and a robust mixed linear model. In a nutshell robust methods are less influenced by outlines and thus better in your situation (probably generally better to use these according to Field and Wilcox see link and reference:
Field, A. P., & Wilcox, R. R. (2017). Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour research and therapy, 98, 19-38.
Standard mixed linear model
library(lme4) #for linear models
library(emmeans) #for estimated marginal means (i.e., differnces between phases, p-valyes and confidence intervals)
lm_cmj<- lmer(cmj_ht ~ 1 + phase + (1|id), data = data)
summary(lm_cmj)Linear mixed model fit by REML ['lmerMod']
Formula: cmj_ht ~ 1 + phase + (1 | id)
Data: data
REML criterion at convergence: 140.4
Scaled residuals:
Min 1Q Median 3Q Max
-2.02149 -0.34897 0.00323 0.69535 1.91308
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 8.159 2.856
Residual 2.691 1.640
Number of obs: 35, groups: id, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 22.0362 1.2668 17.396
phaseLF 2.1416 0.8622 2.484
phaseML 0.3020 0.8412 0.359
phaseO 1.1310 0.7927 1.427
Correlation of Fixed Effects:
(Intr) phasLF phasML
phaseLF -0.215
phaseML -0.229 0.297
phaseO -0.220 0.387 0.316
emm_cmj <- emmeans(lm_cmj, pairwise ~ phase)
emm_cmj$emmeans
phase emmean SE df lower.CL upper.CL
EF 22.0 1.27 5.90 18.9 25.2
LF 24.2 1.37 7.96 21.0 27.3
ML 22.3 1.35 7.61 19.2 25.5
O 23.2 1.34 7.31 20.0 26.3
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
EF - LF -2.142 0.868 26.7 -2.468 0.0884
EF - ML -0.302 0.845 26.5 -0.357 0.9840
EF - O -1.131 0.797 26.6 -1.420 0.4986
LF - ML 1.840 1.015 26.5 1.812 0.2901
LF - O 1.011 0.918 26.0 1.101 0.6923
ML - O -0.829 0.960 26.5 -0.863 0.8236
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 4 estimates
confint(emm_cmj, level = 0.95)$emmeans
phase emmean SE df lower.CL upper.CL
EF 22.0 1.27 5.90 18.9 25.2
LF 24.2 1.37 7.96 21.0 27.3
ML 22.3 1.35 7.61 19.2 25.5
O 23.2 1.34 7.31 20.0 26.3
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
EF - LF -2.142 0.868 26.7 -4.518 0.235
EF - ML -0.302 0.845 26.5 -2.617 2.013
EF - O -1.131 0.797 26.6 -3.313 1.051
LF - ML 1.840 1.015 26.5 -0.942 4.621
LF - O 1.011 0.918 26.0 -1.508 3.529
ML - O -0.829 0.960 26.5 -3.461 1.803
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 4 estimates
Robust mixed linear model
library(robustlmm) #for robust linear models
library(emmeans) #for estimated marginal means (i.e., differnces between phases, p-valyes and confidence intervals)
robust_cmj<- rlmer(cmj_ht ~ 1 + phase + (1|id), data = data)
residuals <- residuals(robust_cmj)
# Perform normality tests (e.g., Shapiro-Wilk test)
shapiro.test(residuals)
Shapiro-Wilk normality test
data: residuals
W = 0.95485, p-value = 0.1596
# Alternatively, you can create a QQ plot
qqnorm(residuals)
qqline(residuals)summary(robust_cmj)Robust linear mixed model fit by DAStau
Formula: cmj_ht ~ 1 + phase + (1 | id)
Data: data
Scaled residuals:
Min 1Q Median 3Q Max
-2.41668 -0.36684 -0.06528 0.68773 2.30453
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 10.196 3.193
Residual 2.191 1.480
Number of obs: 35, groups: id, 6
Fixed effects:
Estimate Std. Error t value
(Intercept) 22.0965 1.4139 15.628
phaseLF 2.1078 0.7997 2.636
phaseML 0.2947 0.7797 0.378
phaseO 1.2197 0.7349 1.660
Correlation of Fixed Effects:
(Intr) phasLF phasML
phaseLF -0.179
phaseML -0.191 0.297
phaseO -0.183 0.390 0.316
Robustness weights for the residuals:
30 weights are ~= 1. The remaining 5 ones are
6 14 25 26 30
0.923 0.710 0.584 0.584 0.556
Robustness weights for the random effects:
[1] 1.00 1.00 1.00 1.00 1.00 0.91
Rho functions used for fitting:
Residuals:
eff: smoothed Huber (k = 1.345, s = 10)
sig: smoothed Huber, Proposal 2 (k = 1.345, s = 10)
Random Effects, variance component 1 (id):
eff: smoothed Huber (k = 1.345, s = 10)
vcp: smoothed Huber, Proposal 2 (k = 1.345, s = 10)
emm_robust_cmj <- emmeans(robust_cmj, pairwise ~ phase)
emm_robust_cmj$emmeans
phase emmean SE df asymp.LCL asymp.UCL
EF 22.1 1.41 Inf 19.3 24.9
LF 24.2 1.49 Inf 21.3 27.1
ML 22.4 1.48 Inf 19.5 25.3
O 23.3 1.47 Inf 20.4 26.2
Confidence level used: 0.95
$contrasts
contrast estimate SE df z.ratio p.value
EF - LF -2.108 0.800 Inf -2.636 0.0418
EF - ML -0.295 0.780 Inf -0.378 0.9816
EF - O -1.220 0.735 Inf -1.660 0.3452
LF - ML 1.813 0.937 Inf 1.936 0.2131
LF - O 0.888 0.850 Inf 1.045 0.7227
ML - O -0.925 0.887 Inf -1.043 0.7239
P value adjustment: tukey method for comparing a family of 4 estimates
confint(emm_robust_cmj, level = 0.95)$emmeans
phase emmean SE df asymp.LCL asymp.UCL
EF 22.1 1.41 Inf 19.3 24.9
LF 24.2 1.49 Inf 21.3 27.1
ML 22.4 1.48 Inf 19.5 25.3
O 23.3 1.47 Inf 20.4 26.2
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL
EF - LF -2.108 0.800 Inf -4.162 -0.0533
EF - ML -0.295 0.780 Inf -2.298 1.7084
EF - O -1.220 0.735 Inf -3.108 0.6682
LF - ML 1.813 0.937 Inf -0.593 4.2194
LF - O 0.888 0.850 Inf -1.294 3.0706
ML - O -0.925 0.887 Inf -3.203 1.3526
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 4 estimates
You can see here there is a 2.1 cm difference in jump height (95% CI 0.05to 4.16 cm) which is statistically significant p = 0,418