#installing packages
library(ggplot2) #main plotting system in R
library(readxl) #allows reading of excel files
library(googlesheets4) #enables access to google sheets
library(janitor) #data cleaning e.g., column names
library(tidyverse) #a package that includes other packages such as ggplot2.
library(lme4) #fites linear models
library(emmeans) #enables estimated marginal means
library(dplyr) #tidyverse package for data manipulation
library(esquisse) #provides a platform to build graphs
library(effectsize) #generated a number of statistics which aids interpretation beyond p-values.
library(shinydashboard) #extension from Shiny to produce dashboard
library(shiny) #building interactive web applications
library(kableExtra) #enhances tables for HTML output
library(readr) #for reading text data Does Altitude Effect Match-Day Physical Performance and Post-Match Soreness of Elite Footballers?
The data presented in this report were collected over four Major League Soccer (MLS) seasons and includes data from 19 outfield players from Philadelphia Union. This report investigates the impact of three altitudes classifications - (Sea (0-100m), Low (1000-2000m), Medium (>2000m) - on match-day physical outputs, with a particular focus on high-speed running (HSR) and post-match soreness.
The impact of altitude on match-day performance and post-match soreness is underpinned by well-established physiological responses to hypoxia (deficiency in the amount of oxygen reaching the tissues (Chen et al. 2020)) . Reduced oxygen availability at altitude lowers arterial oxygen saturation, which impairs aerobic energy production and limits the capacity for sustained high-intensity efforts (GORE, CLARK, and SAUNDERS 2007). As a result, athletes may rely more heavily on anaerobic pathways, leading to greater lactate accumulation, increased neuromuscular fatigue and heightened delayed onset muscle soreness(Chapman, Stray-Gundersen, and Levine 1998). Acute responses, without time to acclimatise, has been shown to reduce total distance covered and sprint frequency in elite footballers during match play (Aughey et al. 2013). The Central Governor Theory also posits that the brain may downregulate muscle recruitment in hypoxic conditions to protect homeostasis, further affecting the output (Noakes 2007).
The first step of analysis was to install the necessary packages.
Set up working directory.
setwd("~/Library/CloudStorage/OneDrive-Personal/Teesside University/Semester 2/R Studio/Work Area/Assessment")Imported data.
#import data
data <- read_csv("Assessment Data.csv")
view(data)Column names cleaned to put them all in a consistent format by removing special characters, spaces and uppercase letters - helps make the data look tidier and more consistent to help avoid errors.
data <- clean_names(data)As the below variables are categorical they were changed to factors.
colnames(data) [1] "id" "date" "altitude_code"
[4] "gd" "replication" "timein_environment"
[7] "altitude" "duration" "distance"
[10] "distancemin" "high_speed_running" "hs_rmin"
[13] "player_load" "playerloadmin" "hr_zone3time"
[16] "hr_zone4time" "hr_zone5time" "hr_zone6time"
[19] "h_rzone_trimp" "soreness_1" "mood_1"
[22] "nutrition_1" "rpe_leg" "rpe_breathe"
[25] "rpe_tech" "rpe_session"
data$id <- as.factor(data$id)
#data$date <- as.Date(data$date,format="%m-%d-%Y") #date being an issue
data$altitude_code<- as.factor(data$altitude_code)
data$gd<- as.factor(data$gd)
data$replication<- as.factor(data$replication)
view(data)The names of the altitude levels were changed from 1,2,3 to help with clarity and prevent confusion.
data <- data %>% #renamed altitude levels for clarity
mutate(altitude_code = recode(altitude_code,
`1` = "Sea",
`2` = "Low",
`3` = "Medium"))The same was done with the training codes.
data <- data %>% #renamed game day for clarity
mutate(gd = recode(gd,
`0` = "MD",
`1` = "MD-1",
`2` = "MD-2"))Data that was not needed was immediately removed. Heart rate had lots of missing data making it difficult to utilise.
#wrangle
data <- data %>% #removed heart rate data as was very sparse with lots of missing values
select(-c(hr_zone3time, hr_zone4time, hr_zone5time, hr_zone6time, h_rzone_trimp))
view(data)Once the above steps were completed, I decided on the variables to analyse.
Through analysis of existing research, it is evident that previous studies have shown altitude exposure to reduce total distance and sprint frequency in footballers (Aughey et al. 2013). Consequently, I was interested in exploring whether a similar trend would emerge in relation to HSR.
I aimed to visualise and explore whether altitude had any impact on HSR expressed as meters per minute across the different altitude levels. HSR was chosen due to its recognised importance in team sports, where it is considered a key determinant of successful performance (Gualtieri et al. 2023). HSR (m/min) was preferred over HSR (m) as it is a relative measure, allowing for fair comparisons between players with varying playing times.
ggplot(data %>% filter(gd == "MD")) + #filter for only match days
aes(x = altitude_code, y = hs_rmin, fill = altitude_code) + #chose hs_rmin to diminish the effects of minutes played.
geom_boxplot() +
geom_point( colour = "red") + #adding in the data points and colouring red
scale_fill_manual(
values = c(Sea = "#011925",
Low = "#418cdd",
Medium = "#c3a871")
) +
theme_classic() +
labs(
x = "Altitude Category", # Rename x-axis label
y = "High-Speed Running (m/min)", # Rename y-axis label
fill = "Altitude Level", # Rename legend title
title = "The Impact of Altitude on High-Speed Running (m/min) Performance"
) match_data<- data %>% #rename dataset
filter(gd == "MD")Following the above, I then wanted to analyse the same variables, HSR (m/min) and altitude, but whilst taking into account different participants in each category.
To analyse and interpret this data I used a linear model to assess the relationship between altitude exposure and various performance metrics, such as HSR and soreness (later in the analysis). This approach is widely used in sport science to account for continuous variables and control for confounders. Following the linear model I applied estimated marginal means (EMM) to compare the adjusted means across conditions which allowed for a more accurate interpretation of the effects of altitude on performance. This method is particularly useful for estimating differences between groups while accounting for other covariates, ensuring robust and reliable results.
#analysing HSR at different altitude levels whilst taking into account different people in each category
l_model1 <- lmer(hs_rmin ~ altitude_code + (1 | id), match_data )
summary(l_model1)Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ altitude_code + (1 | id)
Data: match_data
REML criterion at convergence: 167.5
Scaled residuals:
Min 1Q Median 3Q Max
-1.59107 -0.56127 -0.05725 0.47690 2.02692
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 3.737 1.933
Residual 1.620 1.273
Number of obs: 42, groups: id, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.2710 0.5397 11.619
altitude_codeLow -0.9842 0.5124 -1.921
altitude_codeMedium -0.3737 0.6232 -0.600
Correlation of Fixed Effects:
(Intr) altt_L
altitd_cdLw -0.360
alttd_cdMdm -0.315 0.250
# comparing between altitudes
emm <- emmeans(l_model1, pairwise ~ altitude_code)
#confidence interval (95%) from estimated marginal means
confint(emm)$emmeans
altitude_code emmean SE df lower.CL upper.CL
Sea 6.27 0.543 25.5 5.15 7.39
Low 5.29 0.600 31.9 4.06 6.51
Medium 5.90 0.692 37.5 4.50 7.30
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
Sea - Low 0.984 0.521 24.5 -0.314 2.28
Sea - Medium 0.374 0.636 25.8 -1.207 1.95
Low - Medium -0.610 0.718 26.9 -2.390 1.17
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
print (emm)$emmeans
altitude_code emmean SE df lower.CL upper.CL
Sea 6.27 0.543 25.5 5.15 7.39
Low 5.29 0.600 31.9 4.06 6.51
Medium 5.90 0.692 37.5 4.50 7.30
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Sea - Low 0.984 0.521 24.5 1.890 0.1628
Sea - Medium 0.374 0.636 25.8 0.588 0.8279
Low - Medium -0.610 0.718 26.9 -0.851 0.6752
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
From the outputs provided from the code above, the following observations can be made:
Sea Level vs Low Altitude: Players at low altitude recorded 0.984 m/min less HSR compared to sea level (P=0.1628)
Sea Level vs Medium Altitude: Players at medium altitude recorded 0.374 m/min less HSR compared to sea level (P=0.8279)
Low vs Medium Altitude: Players at medium altitude recorded 0.610 m/min more HSR compared to low altitude (P=0.6752)
In all comparisons, the P-values exceeded 0.05, indicating no statistically significant differences in HSR (m/min) between altitude categories. These findings are consistent with previous research suggesting that altitude has minimal impact on HSR performance (Draper et al. 2022). One potential explanation is that HSR primarily relies on anaerobic energy systems, which are less dependent on oxygen availability (Peronnet, Thibault, and Cousineau 1991). Given that altitude induced hypoxia predominantly affects aerobic capacity, its influence is more pronounced during sustained aerobic efforts rather than short, high-intensity bursts (Peronnet, Thibault, and Cousineau 1991). Therefore, it may have been more informative to analyse the effects of altitude on aerobic-based metrics, such as distance covered during sustained periods of possession.
Following these findings, individual trends were investigated by plotting each player’s HSR (m/min) against altitude.
#Checking individual trends by plotting each player’s HSR vs. altitude
ggplot(match_data, aes(x = altitude_code, y = hs_rmin, group = id, color = id)) +
geom_line() +
geom_point() +
theme_classic() +
labs(title = "Player-Specific Trends in HSR Across Altitudes",
x = "Altitude Category",
y = "High-Speed Running (m/min)") +
theme(legend.position = "none")This graph wasn’t overly useful due to the small number of individuals who had data reported for all three altitudes. A key limitation of the data used in this report is the lack of completeness, which restricts the depth of analysis. To strengthen future investigations, increasing participant numbers and ensuring consistent data compliance is essential.
Following all the above findings, I decided to combine low and medium altitude together to compare data at sea level to altitude (>1000m) to try reduce noise and make the comparisons more focused.
#combining low and high altitude to compare to sea level to reduce noise and make comparison more focused
match_data <- match_data %>%
mutate(altitude_group = ifelse(altitude_code == "Sea", "Sea", "Altitude"))
view(data)I repeated the graph previously presented but this time looking at sea level and altitude as opposed to three different altitude categories.
#boxplot to show HSR(m/min) at sea level and altitude
ggplot(match_data, aes(x = altitude_group, y = hs_rmin, fill = altitude_group)) +
geom_boxplot() +
geom_point(color = "red") +
scale_fill_manual(
values = c("Sea" = "#011925", "Altitude" = "#c3a871") # Sea and Altitude colors
) +
theme_classic() +
theme(legend.position = "none") + # Remove the legend
labs(
x = "Altitude Group",
y = "High-Speed Running (m/min)",
title = "Comparison of HSR/min Between Sea Level and Altitude"
)Following this, analysis of HSR (m/min) and altitude (sea and altitude) was carried out but whilst taking into account different participants in each category.
#linear mix model with new altitude groups
l_model2 <- lmer(hs_rmin ~ altitude_group + (1 | id), data = match_data)
summary(l_model2)Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ altitude_group + (1 | id)
Data: match_data
REML criterion at convergence: 169.4
Scaled residuals:
Min 1Q Median 3Q Max
-1.3712 -0.5724 -0.1238 0.4642 1.9179
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 3.567 1.889
Residual 1.648 1.284
Number of obs: 42, groups: id, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.5190 0.5242 10.529
altitude_groupSea 0.7564 0.4439 1.704
Correlation of Fixed Effects:
(Intr)
altitd_grpS -0.403
#run stats test to see if there's significance for any individual between sea level and altitude
#shows altitude is not changing within player HSR (m/min)?? Altitude is not affecting HSR (m/min).
l_model3 <- lmer(hs_rmin ~ 1 + (1 | id), data = match_data)
summary(l_model3)Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ 1 + (1 | id)
Data: match_data
REML criterion at convergence: 172.4
Scaled residuals:
Min 1Q Median 3Q Max
-1.68343 -0.57864 -0.07324 0.53997 2.03570
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 3.472 1.863
Residual 1.792 1.339
Number of obs: 42, groups: id, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 5.8744 0.4791 12.26
# comparing between sea level and altitude
emm2 <- emmeans(l_model2, pairwise ~ altitude_group)
emm2$emmeans
altitude_group emmean SE df lower.CL upper.CL
Altitude 5.52 0.526 25.1 4.44 6.60
Sea 6.28 0.536 26.0 5.17 7.38
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Altitude - Sea -0.756 0.45 25.4 -1.679 0.1053
Degrees-of-freedom method: kenward-roger
#confidence interval (95%) from estimated marginal means
confint(emm2)$emmeans
altitude_group emmean SE df lower.CL upper.CL
Altitude 5.52 0.526 25.1 4.44 6.60
Sea 6.28 0.536 26.0 5.17 7.38
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
Altitude - Sea -0.756 0.45 25.4 -1.68 0.171
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
On average players ran 0.756 m/min less at altitude compared to sea level. However, this difference was not statistically significant (P=0.1053). The confidence interval crossing zero further supports the lack of a statistically significant effect. While the P-value of 0.1053 exceeds the conventional threshold of 0.05, it is closer to significance than the comparison across all three conditions. This may indicate that altitude has some effect on HSR (m/min), but the current sample size or variability within the data may be limiting the ability to detect a statistically significant difference.
Similar to before, I then checked the individual trends among the players.
#Checking individual trends by plotting each player’s HSR vs. altitude
ggplot(match_data, aes(x = altitude_group, y = hs_rmin, group = id, color = id)) +
geom_line(alpha = 0.6) + # Connects points for each player
geom_point(size = 3) + # Shows individual data points
theme_classic() +
labs(title = "Individual HSR Trends Across Altitude",
x = "Altitude Category",
y = "High-Speed Running (m/min)") +
theme(legend.position = "none") Research shows that reductions in HSR are commonly observed during the second half of match play (Sparks, Coetzee, and Gabbett 2016). Consequently, further research and analysis should consider separating HSR data by halves to provide more detailed insights. This approach could also support conclusions regarding the impact of fatigue, particularly if rating of perceived exertion were collected more frequently during match play.
Furthermore, previous research has demonstrated that high-intensity actions, such as HSR, are key determinants of goal-scoring opportunities (Faude, Koch, and Meyer 2012). Therefore, if match outcomes had been available, further analysis could have explores whether variation in HSR outputs influenced match results.
Following the findings above, and due to the paucity in research, the effects of altitude on post-match soreness were investigated.
Mean soreness and mean HSR (m/min) per player for a match days were calculated. Soreness data was collected the day after a match was played.
match_data <- data %>%
filter(gd == "MD") # Keep only Match Day data
soreness_1 <- match_data %>%
group_by(id, altitude_code) %>% # Grouped by player ID and altitude code
summarize(
mean_soreness_1 = mean(soreness_1, na.rm = TRUE), # Mean soreness per player
mean_hs_rmin = mean(hs_rmin, na.rm = TRUE) # Mean high-speed running per player
)`summarise()` has grouped output by 'id'. You can override using the `.groups`
argument.
summary(soreness_1) id altitude_code mean_soreness_1 mean_hs_rmin
10 : 3 Sea :14 Min. :2.000 Min. : 2.611
109 : 3 Low :11 1st Qu.:4.000 1st Qu.: 4.478
2 : 2 Medium: 8 Median :6.000 Median : 5.668
8 : 2 Mean :5.462 Mean : 5.795
27 : 2 3rd Qu.:6.833 3rd Qu.: 6.875
64 : 2 Max. :8.000 Max. :11.778
(Other):19 NA's :2
The effect of altitude on soreness over the three different altitude groups was then visually displayed.
# Create a boxplot comparing soreness across all three altitude groups (Sea, Low, Medium)
ggplot(soreness_1, aes(x = altitude_code, y = mean_soreness_1, fill = altitude_code)) +
geom_boxplot() + # Create the boxplot
geom_point(color = "red", size = 2, position = position_dodge(width = 0.75)) + # Add red points in a straight line
scale_fill_manual(
values = c("Sea" = "#011925",
"Low" = "#418cdd",
"Medium" = "#c3a871") # Assign different colors to each altitude
) +
theme_classic() + # Use classic theme
theme(legend.position = "none") + # Remove the legend
labs(
x = "Altitude Code", # Label for the x-axis
y = "Mean Soreness", # Label for the y-axis
fill = "Altitude Group", # Legend title
title = "Comparison of Soreness Across Sea, Low, and Medium Altitudes"
)Statistical tests were then ran to determine whether the data showed real, meaningful patterns and not just random noise.
# Fit a mixed-effects model (altitude_code as fixed effect, id as random effect)
lm_model_mixed <- lmer(mean_soreness_1 ~ altitude_code + (1 | id), data = soreness_1)boundary (singular) fit: see help('isSingular')
summary(lm_model_mixed)Linear mixed model fit by REML ['lmerMod']
Formula: mean_soreness_1 ~ altitude_code + (1 | id)
Data: soreness_1
REML criterion at convergence: 95.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.7707 -0.2348 -0.1057 0.6148 1.5849
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.0 0.000
Residual 1.4 1.183
Number of obs: 31, groups: id, 19
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.277778 0.341506 18.383
altitude_codeLow -0.005051 0.493817 -0.010
altitude_codeMedium -3.152778 0.539968 -5.839
Correlation of Fixed Effects:
(Intr) altt_L
altitd_cdLw -0.692
alttd_cdMdm -0.632 0.437
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
# Calculate the estimated marginal means (EMMs) for each altitude code
emm_result <- emmeans(lm_model_mixed, pairwise ~ altitude_code)
summary(emm_result)$emmeans
altitude_code emmean SE df lower.CL upper.CL
Sea 6.28 0.352 28 5.56 7.00
Low 6.27 0.369 28 5.52 7.03
Medium 3.12 0.439 28 2.23 4.02
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Sea - Low 0.00505 0.509 19.5 0.010 0.9999
Sea - Medium 3.15278 0.562 23.6 5.606 <.0001
Low - Medium 3.14773 0.575 25.1 5.471 <.0001
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
# Confidence intervals (95%) for the EMMs
confint(emm_result)$emmeans
altitude_code emmean SE df lower.CL upper.CL
Sea 6.28 0.352 28 5.56 7.00
Low 6.27 0.369 28 5.52 7.03
Medium 3.12 0.439 28 2.23 4.02
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
Sea - Low 0.00505 0.509 19.5 -1.28 1.29
Sea - Medium 3.15278 0.562 23.6 1.75 4.56
Low - Medium 3.14773 0.575 25.1 1.71 4.58
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 3 estimates
print(emm_result)$emmeans
altitude_code emmean SE df lower.CL upper.CL
Sea 6.28 0.352 28 5.56 7.00
Low 6.27 0.369 28 5.52 7.03
Medium 3.12 0.439 28 2.23 4.02
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Sea - Low 0.00505 0.509 19.5 0.010 0.9999
Sea - Medium 3.15278 0.562 23.6 5.606 <.0001
Low - Medium 3.14773 0.575 25.1 5.471 <.0001
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 3 estimates
The results clearly show that the differences between sea level and medium altitude, as well as between low and medium altitude, are statistically significant (P<0.0001). In contrast, the difference between sea level and low altitude is not significant (P=0.9999). Estimated marginal means indicate that average soreness levels at sea level and low altitude are nearly identical (6.28 and 6.27, respectively), whereas medium altitude is associated with a significantly lower average soreness level of 3.12. This suggests that playing at altitudes above 2000m generally reduces soreness scores by approximately 3AU on the Likert Scale (demonstrating an increase in soreness), with a confidence interval ranging from 2.23 to 4.02.
Similar with HSR, low and medium altitude were then combined to form ‘altitude’ so direct comparison of sea level to altitude could be determined.
# Recode altitude_code to combine 'Low' and 'Medium' into 'Altitude', keeping 'Sea' as 'Sea'
soreness_1_combined <- soreness_1 %>%
mutate(altitude_code = ifelse(altitude_code == "Sea", "Sea", "Altitude"))I then viewed this as a boxplot.
# Reorder factor levels so that 'Sea' is on the left and 'Altitude' is on the right
soreness_1_combined <- soreness_1_combined %>%
mutate(altitude_code = factor(altitude_code, levels = c("Sea", "Altitude")))
# Create a boxplot comparing soreness between Sea and combined Altitude (Low and Medium)
ggplot(soreness_1_combined, aes(x = altitude_code, y = mean_soreness_1, fill = altitude_code)) +
geom_boxplot() + # Create the boxplot
geom_point(color = "red", size = 2, position = position_dodge(width = 0.75)) + # Add red points in a straight line
scale_fill_manual(
values = c("Sea" = "#011925",
"Altitude" = "#c3a871") # Combine Low and Medium into 'Altitude'
) +
theme_classic() + # Use classic theme
labs(
x = "Altitude Group", # Label for the x-axis
y = "Mean Soreness", # Label for the y-axis
fill = "Altitude Group", # Legend title
title = "Comparison of Soreness Between Sea Level and Altitude (Low and Medium Combined)"
)I ran some more statistical tests.
# Fit a mixed-effects model (altitude_code as fixed effect, id as random effect)
lm_model_combined_random <- lmer(mean_soreness_1 ~ altitude_code + (1 | id), data = soreness_1_combined)boundary (singular) fit: see help('isSingular')
summary(lm_model_combined_random)Linear mixed model fit by REML ['lmerMod']
Formula: mean_soreness_1 ~ altitude_code + (1 | id)
Data: soreness_1_combined
REML criterion at convergence: 118.9
Scaled residuals:
Min 1Q Median 3Q Max
-1.91369 -0.55311 0.03073 0.61457 1.78224
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 0.000 0.000
Residual 2.934 1.713
Number of obs: 31, groups: id, 19
Fixed effects:
Estimate Std. Error t value
(Intercept) 6.2778 0.4944 12.697
altitude_codeAltitude -1.3304 0.6316 -2.107
Correlation of Fixed Effects:
(Intr)
alttd_cdAlt -0.783
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
# Calculate the estimated marginal means (EMMs) for the combined altitude groups (Sea vs. Altitude)
emm_result_combined <- emmeans(lm_model_combined_random, pairwise ~ altitude_code)
summary(emm_result_combined)$emmeans
altitude_code emmean SE df lower.CL upper.CL
Sea 6.28 0.509 29.0 5.24 7.32
Altitude 4.95 0.408 25.8 4.11 5.79
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Sea - Altitude 1.33 0.651 21 2.045 0.0537
Degrees-of-freedom method: kenward-roger
# Confidence intervals (95%) for the EMMs
confint(emm_result_combined)$emmeans
altitude_code emmean SE df lower.CL upper.CL
Sea 6.28 0.509 29.0 5.24 7.32
Altitude 4.95 0.408 25.8 4.11 5.79
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
Sea - Altitude 1.33 0.651 21 -0.023 2.68
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
print(emm_result_combined)$emmeans
altitude_code emmean SE df lower.CL upper.CL
Sea 6.28 0.509 29.0 5.24 7.32
Altitude 4.95 0.408 25.8 4.11 5.79
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
Sea - Altitude 1.33 0.651 21 2.045 0.0537
Degrees-of-freedom method: kenward-roger
When comparing sea level to altitude, with altitude including all matches played at or above 1000m, the changes in soreness were not statistically significant (P=0.0537). However, as the p-value is very close to the significance threshold of 0.05, it suggests there is a possible effect that could become statistically significant with a larger sample size or more data points. This implies that the current sample may not have enough power to detect the effect conclusively. Further analysis could provide more clarity and confirm if this trend reflects a true difference in soreness between the two conditions.
The results demonstrate that soreness will, on average, increase by 1.33 AU at altitude. The confidence intervals for this change are 0.023 and 2.68 so whilst the lower confidence interval is negligible, the upper confidence interval is substantial. This should be considered when programming and incorporating recovery strategies as an increase of 1 on the Likert Scale is deemed smallest worthwhile change.
There is limited research exploring the effect of altitude on soreness levels. However, the findings from this analysis align with those of Rojas-Valverde et al (2019), who observed increased levels of delayed-onset muscle soreness (DOMS) at higher altitudes (Rojas-Valverde et al. 2019). It is important to note, however, that Rojas-Valverde et al (2019) also included heat as a variable in their study, which could have influenced their results. Therefore, while our findings suggest a similar trend, further research isolating altitude as the primary factor is needed to confirm the relationship between altitude and soreness levels.
While our findings show an increase in soreness at altitude, research has reported that higher altitude hypoxia does not significantly affect muscle function (Edwards et al. 2010), which implies that other factors might be contributing to the observed soreness. Given this, it is possible that environmental stressors, such as temperature, humidity, or the body’s acclimatisation to altitude, rather than altitude alone, are influencing soreness levels. Again, more controlled studies isolating altitude as the primary factor are needed to confirm its direct impact on soreness. Furthermore, monitoring soreness over the next few days following a match would be useful to analyse how long it takes for soreness levels to normalise between individuals.
A graph was generated to plot mean match-day soreness and HSR (m/min) with altitude.
ggplot(soreness_1, aes(x = mean_soreness_1, y = mean_hs_rmin, color = altitude_code)) +
geom_point(size = 3, alpha = 0.8) + # Scatter points
geom_smooth(method = "lm", se = FALSE) + # Add trend lines per altitude
scale_color_manual(
values = c(Sea = "#082143", Low = "#283FCB", Medium = "#E6D23E")
) +
theme_minimal() +
labs(
x = "Mean Soreness Match Day +1",
y = "Mean High-Speed Running (m/min)",
color = "Altitude Level",
title = "Relationship Between Soreness and High-Speed Running at Different Altitudes"
)`geom_smooth()` using formula = 'y ~ x'
The trend lines show that for moderate altitude and sea level, the higher HSR (m/min) output, the higher post-match soreness however, at low altitude the opposite was observed, with the greater HSR (m/min) relating to lower soreness reporting.
From the code and findings above, I then generated a interactive dashboard to display to the coaches to provide the key messages.
# UI ----
ui <- dashboardPage(
dashboardHeader(title = "Matchday Altitude Dashboard"),
dashboardSidebar(
sidebarMenu(
menuItem("Boxplot Analysis", tabName = "boxplot", icon = icon("chart-bar")),
menuItem("Lollipop Graph", tabName = "lollipop", icon = icon("chart-line"))
)
),
dashboardBody(
tabItems(
# 📌 Boxplot Tab ----
tabItem(tabName = "boxplot",
selectInput("view_option", "View Option:",
choices = c("Single Metric", "Both Metrics"),
selected = "Single Metric"),
selectInput("metric", "Select Metric:",
choices = c("HSR per min" = "hsr", "Soreness" = "soreness"),
selected = "hsr"),
fluidRow(
box(plotOutput("altitude_plot"), width = 12)
),
# Conditional Summary Box (Only for Both Metrics)
conditionalPanel(
condition = "input.view_option == 'Both Metrics'",
box(
title = "What Impact does Altitude have on Matchday HSR (m/min) and Soreness?",
width = 12,
textOutput("summary_text")
)
)
),
# 📌 Lollipop Graph Tab ----
tabItem(tabName = "lollipop",
selectInput("view_option_lollipop", "View Option:",
choices = c("Single Metric", "Both Metrics"),
selected = "Single Metric"),
selectInput("lollipop_metric", "Select Metric:",
choices = c("HSR per min" = "hsr", "Soreness" = "soreness"),
selected = "hsr"),
# Single Metric Lollipop Plot
conditionalPanel(
condition = "input.view_option_lollipop == 'Single Metric'",
box(plotOutput("lollipop_plot"), width = 12)
),
# Both Metrics Lollipop Plots Side by Side
conditionalPanel(
condition = "input.view_option_lollipop == 'Both Metrics'",
fluidRow(
box(plotOutput("lollipop_plot_hsr"), width = 6),
box(plotOutput("lollipop_plot_soreness"), width = 6)
),
# Conditional Summary Box for Lollipop Graph
box(
title = "What Impact does Altitude have on Matchday HSR (m/min) and Soreness Across Individuals?",
width = 12,
textOutput("summary_text_lollipop")
)
)
)
)
)
)
# Server ----
server <- function(input, output) {
# Convert data to long format for boxplot
soreness_1_long <- soreness_1 %>%
pivot_longer(cols = c(mean_hs_rmin, mean_soreness_1), names_to = "Metric", values_to = "Value") %>%
mutate(Metric = recode(Metric, mean_hs_rmin = "HSR (m/min)", mean_soreness_1 = "Soreness (1-10)"))
output$altitude_plot <- renderPlot({
if (input$view_option == "Single Metric") {
metric_column <- ifelse(input$metric == "hsr", "mean_hs_rmin", "mean_soreness_1")
metric_label <- ifelse(input$metric == "hsr", "Matchday Mean High-Speed Running (m/min)", "Matchday Mean Soreness")
ggplot(soreness_1, aes(x = altitude_code, y = .data[[metric_column]], fill = altitude_code)) +
geom_boxplot(outlier.shape = NA) +
geom_point(color = "red", size = 2, alpha = 0.8, position = position_nudge(x = 0)) +
scale_fill_manual(values = c(Sea = "#011925", Low = "#418cdd", Medium = "#c3a871")) +
scale_x_discrete(labels = c("Sea" = "Sea", "Low" = "Low (1000-2000m)", "Medium" = "Medium (>2000m)")) +
theme_classic() + theme(legend.position = "none") +
labs(
x = "Altitude Category",
y = metric_label,
fill = "Altitude Level",
title = paste("Impact of Altitude on", metric_label)
)
} else {
ggplot(soreness_1_long, aes(x = altitude_code, y = Value, fill = altitude_code)) +
geom_boxplot(outlier.shape = NA) +
geom_point(color = "red", size = 2, alpha = 0.8, position = position_nudge(x = 0)) +
facet_wrap(~ Metric, scales = "free") +
scale_fill_manual(values = c(
"Sea" = "#011925",
"Low" = "#418cdd",
"Medium" = "#c3a871"
)) +
scale_x_discrete(labels = c("Sea" = "Sea (0-1000m)", "Low" = "Low (1000-2000m)", "Medium" = "Medium (>2000m)")) +
theme_classic() + theme(legend.position = "none") +
labs(
x = "Altitude Category",
y = "",
fill = "Altitude Level",
title = "Impact of Altitude on Matchday HSR (m/min) and Soreness"
)
}
})
# Compute individual differences for Lollipop Graph
individual_differences <- soreness_1 %>%
group_by(id) %>%
summarize(
hsr_diff = mean(mean_hs_rmin[altitude_code != "Sea"], na.rm = TRUE) - mean(mean_hs_rmin[altitude_code == "Sea"], na.rm = TRUE),
soreness_diff = mean(mean_soreness_1[altitude_code != "Sea"], na.rm = TRUE) - mean(mean_soreness_1[altitude_code == "Sea"], na.rm = TRUE)
) %>%
pivot_longer(cols = c(hsr_diff, soreness_diff), names_to = "Metric", values_to = "Difference") %>%
mutate(Metric = recode(Metric, hsr_diff = "HSR per min", soreness_diff = "Soreness"))
render_lollipop <- function(metric_label) {
plot_data <- individual_differences %>%
filter(Metric == metric_label, !is.na(Difference))
ggplot(plot_data, aes(x = Difference, y = reorder(id, Difference))) +
geom_rect(aes(xmin = -1, xmax = 1, ymin = -Inf, ymax = Inf), fill = "gray80", alpha = 0.3) +
geom_segment(aes(xend = 0, yend = id), color = "black") +
geom_point(size = 4, color = "red") +
geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
theme_classic() + theme(legend.position = "none") +
labs(x = "Difference from Sea Level (0-1000m) to Altitude (>1000m)", y = "Player ID", title = ifelse(metric_label == "HSR per min",
"Individual Differences in Matchday HSR (m/min) from Sea Level to Altitude",
paste("Individual Differences in", metric_label, "from Sea Level to Altitude")))
}
# Render Single Lollipop Graph
output$lollipop_plot <- renderPlot({
render_lollipop(ifelse(input$lollipop_metric == "hsr", "HSR per min", "Soreness"))
})
# Render Both Lollipop Graphs Side by Side
output$lollipop_plot_hsr <- renderPlot({ render_lollipop("HSR per min") })
output$lollipop_plot_soreness <- renderPlot({ render_lollipop("Soreness") })
# Create summary text for findings (Boxplot)
output$summary_text <- renderText({
if (input$view_option == "Both Metrics") {
return("The Boxplots show no significant differences in matchday high-speed running (m/min) across the altitude categories, suggesting altitude has minimal impact on matchday running output. However, next-day soreness ratings were lower at medium altitudes (>2000m), decreasing from approximately a 6/10 to a 3/10. This indicates players experienced greater muscle soreness at altitudes above 2000m. On average, soreness scores decreased by 3.15 AU on the Likert Scale at medium altitude, which exceeds the smallest worthwhile change. Furthermore, this reduction is estimated to lie between 2.23 and 4.02 AU, suggesting a meaningful impact. This increased soreness (observed via a reduction is score on the Likert Scale) may reflect greater muscle stress due to lower atmospheric pressure and reduced oxygen availability. These findings highlight the importance of enhanced recovery strategies - such as extended recovery time - when playing at higher altitudes. Overall, the data suggests that while altitude may not drastically alter match-day running performance, it can influence fatigue and recovery.")
} else {
metric_label <- ifelse(input$metric == "hsr", "HSR per min", "Soreness")
return(paste("The analysis for", metric_label, "at different altitudes suggests the impact of altitude on the metric. Further detailed analysis is required to draw conclusions on the impact at different altitude levels."))
}
})
# Create summary text for findings (Lollipop)
output$summary_text_lollipop <- renderText({
if (input$view_option_lollipop == "Both Metrics") {
return("When exploring individual responses, 6 out of 10 players showed either an increase in matchday HSR (m/min) accompanied by an increased soreness, or a decrease in both HSR (m/min) and soreness at altitude. In contrast, the remaining 4 players exhibited mismatched trends, displaying increased soreness despite a reduction in HSR (m/min) at altitude. Moreover, it is important to note no player increased in HSR (m/min) or reported less soreness outside of the smallest worthwhile change (+1) whereas the magnitude of HSR(m/min) decrease and soreness increase was much larger. This highlights the individual variability in adaptation to altitude and the importance of monitoring both internal and external load, further supporting the importance of obtaining soreness data at altitude.")
} else {
metric_label <- ifelse(input$lollipop_metric == "hsr", "HSR per min", "Soreness")
return(paste("The analysis of", metric_label, "individual differences at different altitudes suggests significant changes in performance and discomfort. Further exploration may be needed to assess the implications of these changes at different altitudes."))
}
})
}
# Run App ----
shinyApp(ui = ui, server = server)In relation to the lollipop graphs displayed in this dashboard, the grey shaded region spans from -1 to 1, representing the zone of trivial change - a range within which differences are not considered practically meaningful. This is based on the concept of the smallest worthwhile change (SWC), which, for performance, is often set at 0.2 times the between-subject standard deviation (Fang and Ho 2020). By using a fixed range such as -1 to 1, we provide a visual threshold for interpreting whether observed differences exceed what could be attributed to natural variability or measurement noise.
Research has shown that match-to-match variability in high-speed running is approximately 16% (Gregson et al. 2010). Therefore, to calculate the SWC in high-speed running, 16% of the mean HSR was used, resulting in a value of 0.93. As a result, a threshold of ±1 was applied.
In relation to soreness, the SWC was calculated from taking the mean match-day soreness over the 3 altitudes and multiplying it by 0.2 ((6.28 + 6.27 + 3.12) /3 = 5.22 x 0.2 = 1.04. Therefore, a threshold of ±1 was utilised for soreness.
From the lollipop graphs it is important to note no player increased in HSR (m/min) or reported less soreness outside of the smallest worthwhile change (+1) whereas the magnitude of HSR(m/min) decrease and soreness increase was much larger. The data suggests that whilst there wasn’t a meaningful improvement in performance or recovery (soreness reduction), some players experienced a notable decline in performance (HSR) and an increase in soreness at higher altitudes. This could indicate that altitude negatively affects some players’ physical performance and recovery, but no positive changes (increase in HSR or decrease in soreness) were observed beyond the minimal threshold.
Practical Application:
Impact of Match-Day Performance
High-Speed Running (m/min) shows no significant difference across altitude categories, suggesting that performance during match play is not heavily impacted by altitude.
Recovery and Soreness
Soreness ratings were lower at moderate altitudes (>2000m), indicating increased muscle fatigue and soreness. This could be attributed to lower atmospheric pressure and reduced oxygen availability at higher altitudes.
Altitude and Recovery Strategies
Although altitude may not cause a drastic decline in performance, it’s influence on post-match soreness and overall readiness highlights the need for enhanced recovery protocols - such as extended recovery periods and adjusted training loads - for athletes competing at higher altitudes. Effectively managing post-match soreness, through constant data collection, is particularly important in these environments to maintain performance and reduce injury risk.
Further Research
These findings highlight the need for further research into the long-term impacts of training and competing at higher altitudes, alongside the development of effective recovery strategies to optimise athlete performance and well-being in such challenging environments. Additionally, future research should explore post-match soreness trends over subsequent days to better understand how altitude influences recovery and the return to baseline levels.