Does Altitude Effect Match-Day Physical Performance and Post-Match Soreness of Elite Footballers?

The data presented in this report were collected over four Major League Soccer (MLS) seasons and includes data from 19 outfield players from Philadelphia Union. This report investigates the impact of three altitudes classifications - (Sea (0-100m), Low (1000-2000m), Medium (>2000m) - on match-day physical outputs, with a particular focus on high-speed running (HSR) and post-match soreness.

The impact of altitude on match-day performance and post-match soreness is underpinned by well-established physiological responses to hypoxia (deficiency in the amount of oxygen reaching the tissues (Chen et al. 2020)) . Reduced oxygen availability at altitude lowers arterial oxygen saturation, which impairs aerobic energy production and limits the capacity for sustained high-intensity efforts (GORE, CLARK, and SAUNDERS 2007). As a result, athletes may rely more heavily on anaerobic pathways, leading to greater lactate accumulation, increased neuromuscular fatigue and heightened delayed onset muscle soreness(Chapman, Stray-Gundersen, and Levine 1998). Acute responses, without time to acclimatise, has been shown to reduce total distance covered and sprint frequency in elite footballers during match play (Aughey et al. 2013). The Central Governor Theory also posits that the brain may downregulate muscle recruitment in hypoxic conditions to protect homeostasis, further affecting the output (Noakes 2007).

The first step of analysis was to install the necessary packages.

#installing packages
library(ggplot2) #main plotting system in R
library(readxl) #allows reading of excel files
library(googlesheets4) #enables access to google sheets 
library(janitor) #data cleaning e.g., column names
library(tidyverse) #a package that includes other packages such as ggplot2.
library(lme4) #fites linear models
library(emmeans) #enables estimated marginal means 
library(dplyr) #tidyverse package for data manipulation
library(esquisse) #provides a platform to build graphs
library(effectsize) #generated a number of statistics which aids interpretation beyond p-values.
library(shinydashboard) #extension from Shiny to produce dashboard
library(shiny) #building interactive web applications
library(kableExtra) #enhances tables for HTML output
library(readr) #for reading text data

Set up working directory.

setwd("~/Library/CloudStorage/OneDrive-Personal/Teesside University/Semester 2/R Studio/Work Area/Assessment")

Imported data.

#import data
data <- read_csv("Assessment Data.csv")
    view(data)

Column names cleaned to put them all in a consistent format by removing special characters, spaces and uppercase letters - helps make the data look tidier and more consistent to help avoid errors.

  data <- clean_names(data)

As the below variables are categorical they were changed to factors.

 colnames(data)

 [1] "id"                 "date"               "altitude_code"     
 [4] "gd"                 "replication"        "timein_environment"
 [7] "altitude"           "duration"           "distance"          
[10] "distancemin"        "high_speed_running" "hs_rmin"           
[13] "player_load"        "playerloadmin"      "hr_zone3time"      
[16] "hr_zone4time"       "hr_zone5time"       "hr_zone6time"      
[19] "h_rzone_trimp"      "soreness_1"         "mood_1"            
[22] "nutrition_1"        "rpe_leg"            "rpe_breathe"       
[25] "rpe_tech"           "rpe_session"

   data$id <- as.factor(data$id)  
   #data$date <- as.Date(data$date,format="%m-%d-%Y") #date being an issue
   data$altitude_code<- as.factor(data$altitude_code)
   data$gd<- as.factor(data$gd)
   data$replication<- as.factor(data$replication)
   
   view(data)

The names of the altitude levels were changed from 1,2,3 to help with clarity and prevent confusion.

data <- data %>%           #renamed altitude levels for clarity
     mutate(altitude_code = recode(altitude_code, 
                                   `1` = "Sea", 
                                   `2` = "Low", 
                                   `3` = "Medium"))

The same was done with the training codes.

data <- data %>%           #renamed game day for clarity
     mutate(gd = recode(gd, 
                                   `0` = "MD", 
                                   `1` = "MD-1", 
                                   `2` = "MD-2"))

Data that was not needed was immediately removed. Heart rate had lots of missing data making it difficult to utilise.

#wrangle
    data <- data %>%     #removed heart rate data as was very sparse with lots of missing values
     select(-c(hr_zone3time, hr_zone4time, hr_zone5time, hr_zone6time, h_rzone_trimp))
   view(data)

Once the above steps were completed, I decided on the variables to analyse.

Through analysis of existing research, it is evident that previous studies have shown altitude exposure to reduce total distance and sprint frequency in footballers (Aughey et al. 2013). Consequently, I was interested in exploring whether a similar trend would emerge in relation to HSR.

I aimed to visualise and explore whether altitude had any impact on HSR expressed as meters per minute across the different altitude levels. HSR was chosen due to its recognised importance in team sports, where it is considered a key determinant of successful performance (Gualtieri et al. 2023). HSR (m/min) was preferred over HSR (m) as it is a relative measure, allowing for fair comparisons between players with varying playing times.

ggplot(data %>% filter(gd == "MD")) + #filter for only match days 
     aes(x = altitude_code, y = hs_rmin, fill = altitude_code) + #chose hs_rmin to diminish the effects of minutes played. 
     geom_boxplot() +
     geom_point( colour = "red") + #adding in the data points and colouring red
     scale_fill_manual(
       values = c(Sea = "#011925",
                  Low = "#418cdd",
                  Medium = "#c3a871")
     ) +
     theme_classic() +
     labs(
       x = "Altitude Category",  # Rename x-axis label
       y = "High-Speed Running (m/min)",  # Rename y-axis label
       fill = "Altitude Level",  # Rename legend title
       title = "The Impact of Altitude on High-Speed Running (m/min) Performance"
     )

  match_data<- data %>% #rename dataset
    filter(gd == "MD")

Following the above, I then wanted to analyse the same variables, HSR (m/min) and altitude, but whilst taking into account different participants in each category.

To analyse and interpret this data I used a linear model to assess the relationship between altitude exposure and various performance metrics, such as HSR and soreness (later in the analysis). This approach is widely used in sport science to account for continuous variables and control for confounders. Following the linear model I applied estimated marginal means (EMM) to compare the adjusted means across conditions which allowed for a more accurate interpretation of the effects of altitude on performance. This method is particularly useful for estimating differences between groups while accounting for other covariates, ensuring robust and reliable results.

#analysing HSR at different altitude levels whilst taking into account different people in each category
  l_model1 <- lmer(hs_rmin ~ altitude_code + (1 | id), match_data )
   summary(l_model1)

Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ altitude_code + (1 | id)
   Data: match_data

REML criterion at convergence: 167.5

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.59107 -0.56127 -0.05725  0.47690  2.02692 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 3.737    1.933   
 Residual             1.620    1.273   
Number of obs: 42, groups:  id, 20

Fixed effects:
                    Estimate Std. Error t value
(Intercept)           6.2710     0.5397  11.619
altitude_codeLow     -0.9842     0.5124  -1.921
altitude_codeMedium  -0.3737     0.6232  -0.600

Correlation of Fixed Effects:
            (Intr) altt_L
altitd_cdLw -0.360       
alttd_cdMdm -0.315  0.250

 # comparing between altitudes   
   emm <- emmeans(l_model1, pairwise ~ altitude_code)
  
  #confidence interval (95%) from estimated marginal means
  confint(emm)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.27 0.543 25.5     5.15     7.39
 Low             5.29 0.600 31.9     4.06     6.51
 Medium          5.90 0.692 37.5     4.50     7.30

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df lower.CL upper.CL
 Sea - Low       0.984 0.521 24.5   -0.314     2.28
 Sea - Medium    0.374 0.636 25.8   -1.207     1.95
 Low - Medium   -0.610 0.718 26.9   -2.390     1.17

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 3 estimates

  print (emm)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.27 0.543 25.5     5.15     7.39
 Low             5.29 0.600 31.9     4.06     6.51
 Medium          5.90 0.692 37.5     4.50     7.30

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df t.ratio p.value
 Sea - Low       0.984 0.521 24.5   1.890  0.1628
 Sea - Medium    0.374 0.636 25.8   0.588  0.8279
 Low - Medium   -0.610 0.718 26.9  -0.851  0.6752

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates

From the outputs provided from the code above, the following observations can be made:

Sea Level vs Low Altitude: Players at low altitude recorded 0.984 m/min less HSR compared to sea level (P=0.1628)

Sea Level vs Medium Altitude: Players at medium altitude recorded 0.374 m/min less HSR compared to sea level (P=0.8279)

Low vs Medium Altitude: Players at medium altitude recorded 0.610 m/min more HSR compared to low altitude (P=0.6752)

In all comparisons, the P-values exceeded 0.05, indicating no statistically significant differences in HSR (m/min) between altitude categories. These findings are consistent with previous research suggesting that altitude has minimal impact on HSR performance (Draper et al. 2022). One potential explanation is that HSR primarily relies on anaerobic energy systems, which are less dependent on oxygen availability (Peronnet, Thibault, and Cousineau 1991). Given that altitude induced hypoxia predominantly affects aerobic capacity, its influence is more pronounced during sustained aerobic efforts rather than short, high-intensity bursts (Peronnet, Thibault, and Cousineau 1991). Therefore, it may have been more informative to analyse the effects of altitude on aerobic-based metrics, such as distance covered during sustained periods of possession.

Following these findings, individual trends were investigated by plotting each player’s HSR (m/min) against altitude.

#Checking individual trends by plotting each player’s HSR vs. altitude
  ggplot(match_data, aes(x = altitude_code, y = hs_rmin, group = id, color = id)) +
    geom_line() +
    geom_point() +
    theme_classic() +
    labs(title = "Player-Specific Trends in HSR Across Altitudes",
         x = "Altitude Category",
         y = "High-Speed Running (m/min)") +
    theme(legend.position = "none")

This graph wasn’t overly useful due to the small number of individuals who had data reported for all three altitudes. A key limitation of the data used in this report is the lack of completeness, which restricts the depth of analysis. To strengthen future investigations, increasing participant numbers and ensuring consistent data compliance is essential.

Following all the above findings, I decided to combine low and medium altitude together to compare data at sea level to altitude (>1000m) to try reduce noise and make the comparisons more focused.

 #combining low and high altitude to compare to sea level to reduce noise and make comparison more focused
  match_data <- match_data %>%
    mutate(altitude_group = ifelse(altitude_code == "Sea", "Sea", "Altitude"))
  view(data)

I repeated the graph previously presented but this time looking at sea level and altitude as opposed to three different altitude categories.

#boxplot to show HSR(m/min) at sea level and altitude
 ggplot(match_data, aes(x = altitude_group, y = hs_rmin, fill = altitude_group)) +
    geom_boxplot() +
    geom_point(color = "red") +
    scale_fill_manual(
      values = c("Sea" = "#011925", "Altitude" = "#c3a871")  # Sea and Altitude colors
    ) +
    theme_classic() +
    theme(legend.position = "none") +  # Remove the legend
    labs(
      x = "Altitude Group",
      y = "High-Speed Running (m/min)",
      title = "Comparison of HSR/min Between Sea Level and Altitude"
    )

Following this, analysis of HSR (m/min) and altitude (sea and altitude) was carried out but whilst taking into account different participants in each category.

#linear mix model with new altitude groups 
   l_model2 <- lmer(hs_rmin ~ altitude_group + (1 | id), data = match_data)
  summary(l_model2)

Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ altitude_group + (1 | id)
   Data: match_data

REML criterion at convergence: 169.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.3712 -0.5724 -0.1238  0.4642  1.9179 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 3.567    1.889   
 Residual             1.648    1.284   
Number of obs: 42, groups:  id, 20

Fixed effects:
                  Estimate Std. Error t value
(Intercept)         5.5190     0.5242  10.529
altitude_groupSea   0.7564     0.4439   1.704

Correlation of Fixed Effects:
            (Intr)
altitd_grpS -0.403

  #run stats test to see if there's significance for any individual between sea level and altitude
  #shows altitude is not changing within player HSR (m/min)?? Altitude is not affecting HSR (m/min). 
  l_model3 <- lmer(hs_rmin ~ 1 + (1 | id), data = match_data)
  summary(l_model3)

Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ 1 + (1 | id)
   Data: match_data

REML criterion at convergence: 172.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.68343 -0.57864 -0.07324  0.53997  2.03570 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 3.472    1.863   
 Residual             1.792    1.339   
Number of obs: 42, groups:  id, 20

Fixed effects:
            Estimate Std. Error t value
(Intercept)   5.8744     0.4791   12.26

  # comparing between sea level and altitude
  emm2 <- emmeans(l_model2, pairwise ~ altitude_group)
  emm2

$emmeans
 altitude_group emmean    SE   df lower.CL upper.CL
 Altitude         5.52 0.526 25.1     4.44     6.60
 Sea              6.28 0.536 26.0     5.17     7.38

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate   SE   df t.ratio p.value
 Altitude - Sea   -0.756 0.45 25.4  -1.679  0.1053

Degrees-of-freedom method: kenward-roger

  #confidence interval (95%) from estimated marginal means
  confint(emm2)

$emmeans
 altitude_group emmean    SE   df lower.CL upper.CL
 Altitude         5.52 0.526 25.1     4.44     6.60
 Sea              6.28 0.536 26.0     5.17     7.38

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate   SE   df lower.CL upper.CL
 Altitude - Sea   -0.756 0.45 25.4    -1.68    0.171

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

On average players ran 0.756 m/min less at altitude compared to sea level. However, this difference was not statistically significant (P=0.1053). The confidence interval crossing zero further supports the lack of a statistically significant effect. While the P-value of 0.1053 exceeds the conventional threshold of 0.05, it is closer to significance than the comparison across all three conditions. This may indicate that altitude has some effect on HSR (m/min), but the current sample size or variability within the data may be limiting the ability to detect a statistically significant difference.

Similar to before, I then checked the individual trends among the players.

 #Checking individual trends by plotting each player’s HSR vs. altitude
  ggplot(match_data, aes(x = altitude_group, y = hs_rmin, group = id, color = id)) +
    geom_line(alpha = 0.6) +   # Connects points for each player
    geom_point(size = 3) +     # Shows individual data points
    theme_classic() +
    labs(title = "Individual HSR Trends Across Altitude",
         x = "Altitude Category",
         y = "High-Speed Running (m/min)") +
    theme(legend.position = "none")

Research shows that reductions in HSR are commonly observed during the second half of match play (Sparks, Coetzee, and Gabbett 2016). Consequently, further research and analysis should consider separating HSR data by halves to provide more detailed insights. This approach could also support conclusions regarding the impact of fatigue, particularly if rating of perceived exertion were collected more frequently during match play.

Furthermore, previous research has demonstrated that high-intensity actions, such as HSR, are key determinants of goal-scoring opportunities (Faude, Koch, and Meyer 2012). Therefore, if match outcomes had been available, further analysis could have explores whether variation in HSR outputs influenced match results.

Following the findings above, and due to the paucity in research, the effects of altitude on post-match soreness were investigated.

Mean soreness and mean HSR (m/min) per player for a match days were calculated. Soreness data was collected the day after a match was played.

 match_data <- data %>% 
    filter(gd == "MD")  # Keep only Match Day data
  
  soreness_1 <- match_data %>%
    group_by(id, altitude_code) %>%  # Grouped by player ID and altitude code
    summarize(
      mean_soreness_1 = mean(soreness_1, na.rm = TRUE),  # Mean soreness per player
      mean_hs_rmin = mean(hs_rmin, na.rm = TRUE)  # Mean high-speed running per player
    )

`summarise()` has grouped output by 'id'. You can override using the `.groups`
argument.

summary(soreness_1)

       id     altitude_code mean_soreness_1  mean_hs_rmin   
 10     : 3   Sea   :14     Min.   :2.000   Min.   : 2.611  
 109    : 3   Low   :11     1st Qu.:4.000   1st Qu.: 4.478  
 2      : 2   Medium: 8     Median :6.000   Median : 5.668  
 8      : 2                 Mean   :5.462   Mean   : 5.795  
 27     : 2                 3rd Qu.:6.833   3rd Qu.: 6.875  
 64     : 2                 Max.   :8.000   Max.   :11.778  
 (Other):19                 NA's   :2

The effect of altitude on soreness over the three different altitude groups was then visually displayed.

# Create a boxplot comparing soreness across all three altitude groups (Sea, Low, Medium)
ggplot(soreness_1, aes(x = altitude_code, y = mean_soreness_1, fill = altitude_code)) +
  geom_boxplot() +  # Create the boxplot
  geom_point(color = "red", size = 2, position = position_dodge(width = 0.75)) +  # Add red points in a straight line
  scale_fill_manual(
    values = c("Sea" = "#011925", 
               "Low" = "#418cdd", 
               "Medium" = "#c3a871")  # Assign different colors to each altitude
  ) +
  theme_classic() +  # Use classic theme
  theme(legend.position = "none") +  # Remove the legend
  labs(
    x = "Altitude Code",  # Label for the x-axis
    y = "Mean Soreness",  # Label for the y-axis
    fill = "Altitude Group",  # Legend title
    title = "Comparison of Soreness Across Sea, Low, and Medium Altitudes"
  )

Statistical tests were then ran to determine whether the data showed real, meaningful patterns and not just random noise.

# Fit a mixed-effects model (altitude_code as fixed effect, id as random effect)
lm_model_mixed <- lmer(mean_soreness_1 ~ altitude_code + (1 | id), data = soreness_1)

boundary (singular) fit: see help('isSingular')

summary(lm_model_mixed)

Linear mixed model fit by REML ['lmerMod']
Formula: mean_soreness_1 ~ altitude_code + (1 | id)
   Data: soreness_1

REML criterion at convergence: 95.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.7707 -0.2348 -0.1057  0.6148  1.5849 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.0      0.000   
 Residual             1.4      1.183   
Number of obs: 31, groups:  id, 19

Fixed effects:
                     Estimate Std. Error t value
(Intercept)          6.277778   0.341506  18.383
altitude_codeLow    -0.005051   0.493817  -0.010
altitude_codeMedium -3.152778   0.539968  -5.839

Correlation of Fixed Effects:
            (Intr) altt_L
altitd_cdLw -0.692       
alttd_cdMdm -0.632  0.437
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

# Calculate the estimated marginal means (EMMs) for each altitude code
emm_result <- emmeans(lm_model_mixed, pairwise ~ altitude_code)
summary(emm_result)

$emmeans
 altitude_code emmean    SE df lower.CL upper.CL
 Sea             6.28 0.352 28     5.56     7.00
 Low             6.27 0.369 28     5.52     7.03
 Medium          3.12 0.439 28     2.23     4.02

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df t.ratio p.value
 Sea - Low     0.00505 0.509 19.5   0.010  0.9999
 Sea - Medium  3.15278 0.562 23.6   5.606  <.0001
 Low - Medium  3.14773 0.575 25.1   5.471  <.0001

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates

# Confidence intervals (95%) for the EMMs
confint(emm_result)

$emmeans
 altitude_code emmean    SE df lower.CL upper.CL
 Sea             6.28 0.352 28     5.56     7.00
 Low             6.27 0.369 28     5.52     7.03
 Medium          3.12 0.439 28     2.23     4.02

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df lower.CL upper.CL
 Sea - Low     0.00505 0.509 19.5    -1.28     1.29
 Sea - Medium  3.15278 0.562 23.6     1.75     4.56
 Low - Medium  3.14773 0.575 25.1     1.71     4.58

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 3 estimates

print(emm_result)

$emmeans
 altitude_code emmean    SE df lower.CL upper.CL
 Sea             6.28 0.352 28     5.56     7.00
 Low             6.27 0.369 28     5.52     7.03
 Medium          3.12 0.439 28     2.23     4.02

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df t.ratio p.value
 Sea - Low     0.00505 0.509 19.5   0.010  0.9999
 Sea - Medium  3.15278 0.562 23.6   5.606  <.0001
 Low - Medium  3.14773 0.575 25.1   5.471  <.0001

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates

The results clearly show that the differences between sea level and medium altitude, as well as between low and medium altitude, are statistically significant (P<0.0001). In contrast, the difference between sea level and low altitude is not significant (P=0.9999). Estimated marginal means indicate that average soreness levels at sea level and low altitude are nearly identical (6.28 and 6.27, respectively), whereas medium altitude is associated with a significantly lower average soreness level of 3.12. This suggests that playing at altitudes above 2000m generally reduces soreness scores by approximately 3AU on the Likert Scale (demonstrating an increase in soreness), with a confidence interval ranging from 2.23 to 4.02.

Similar with HSR, low and medium altitude were then combined to form ‘altitude’ so direct comparison of sea level to altitude could be determined.

# Recode altitude_code to combine 'Low' and 'Medium' into 'Altitude', keeping 'Sea' as 'Sea'
soreness_1_combined <- soreness_1 %>%
  mutate(altitude_code = ifelse(altitude_code == "Sea", "Sea", "Altitude"))

I then viewed this as a boxplot.

# Reorder factor levels so that 'Sea' is on the left and 'Altitude' is on the right
soreness_1_combined <- soreness_1_combined %>%
  mutate(altitude_code = factor(altitude_code, levels = c("Sea", "Altitude")))

# Create a boxplot comparing soreness between Sea and combined Altitude (Low and Medium)
ggplot(soreness_1_combined, aes(x = altitude_code, y = mean_soreness_1, fill = altitude_code)) +
  geom_boxplot() +  # Create the boxplot
  geom_point(color = "red", size = 2, position = position_dodge(width = 0.75)) +  # Add red points in a straight line
  scale_fill_manual(
    values = c("Sea" = "#011925", 
               "Altitude" = "#c3a871")  # Combine Low and Medium into 'Altitude'
  ) +
  theme_classic() +  # Use classic theme
  labs(
    x = "Altitude Group",  # Label for the x-axis
    y = "Mean Soreness",  # Label for the y-axis
    fill = "Altitude Group",  # Legend title
    title = "Comparison of Soreness Between Sea Level and Altitude (Low and Medium Combined)"
  )

I ran some more statistical tests.

# Fit a mixed-effects model (altitude_code as fixed effect, id as random effect)
lm_model_combined_random <- lmer(mean_soreness_1 ~ altitude_code + (1 | id), data = soreness_1_combined)

boundary (singular) fit: see help('isSingular')

summary(lm_model_combined_random)

Linear mixed model fit by REML ['lmerMod']
Formula: mean_soreness_1 ~ altitude_code + (1 | id)
   Data: soreness_1_combined

REML criterion at convergence: 118.9

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.91369 -0.55311  0.03073  0.61457  1.78224 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.000    0.000   
 Residual             2.934    1.713   
Number of obs: 31, groups:  id, 19

Fixed effects:
                      Estimate Std. Error t value
(Intercept)             6.2778     0.4944  12.697
altitude_codeAltitude  -1.3304     0.6316  -2.107

Correlation of Fixed Effects:
            (Intr)
alttd_cdAlt -0.783
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

# Calculate the estimated marginal means (EMMs) for the combined altitude groups (Sea vs. Altitude)
emm_result_combined <- emmeans(lm_model_combined_random, pairwise ~ altitude_code)
summary(emm_result_combined)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.28 0.509 29.0     5.24     7.32
 Altitude        4.95 0.408 25.8     4.11     5.79

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate    SE df t.ratio p.value
 Sea - Altitude     1.33 0.651 21   2.045  0.0537

Degrees-of-freedom method: kenward-roger

# Confidence intervals (95%) for the EMMs
confint(emm_result_combined)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.28 0.509 29.0     5.24     7.32
 Altitude        4.95 0.408 25.8     4.11     5.79

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate    SE df lower.CL upper.CL
 Sea - Altitude     1.33 0.651 21   -0.023     2.68

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

print(emm_result_combined)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.28 0.509 29.0     5.24     7.32
 Altitude        4.95 0.408 25.8     4.11     5.79

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate    SE df t.ratio p.value
 Sea - Altitude     1.33 0.651 21   2.045  0.0537

Degrees-of-freedom method: kenward-roger

When comparing sea level to altitude, with altitude including all matches played at or above 1000m, the changes in soreness were not statistically significant (P=0.0537). However, as the p-value is very close to the significance threshold of 0.05, it suggests there is a possible effect that could become statistically significant with a larger sample size or more data points. This implies that the current sample may not have enough power to detect the effect conclusively. Further analysis could provide more clarity and confirm if this trend reflects a true difference in soreness between the two conditions.

The results demonstrate that soreness will, on average, increase by 1.33 AU at altitude. The confidence intervals for this change are 0.023 and 2.68 so whilst the lower confidence interval is negligible, the upper confidence interval is substantial. This should be considered when programming and incorporating recovery strategies as an increase of 1 on the Likert Scale is deemed smallest worthwhile change.

There is limited research exploring the effect of altitude on soreness levels. However, the findings from this analysis align with those of Rojas-Valverde et al (2019), who observed increased levels of delayed-onset muscle soreness (DOMS) at higher altitudes (Rojas-Valverde et al. 2019). It is important to note, however, that Rojas-Valverde et al (2019) also included heat as a variable in their study, which could have influenced their results. Therefore, while our findings suggest a similar trend, further research isolating altitude as the primary factor is needed to confirm the relationship between altitude and soreness levels.

While our findings show an increase in soreness at altitude, research has reported that higher altitude hypoxia does not significantly affect muscle function (Edwards et al. 2010), which implies that other factors might be contributing to the observed soreness. Given this, it is possible that environmental stressors, such as temperature, humidity, or the body’s acclimatisation to altitude, rather than altitude alone, are influencing soreness levels. Again, more controlled studies isolating altitude as the primary factor are needed to confirm its direct impact on soreness. Furthermore, monitoring soreness over the next few days following a match would be useful to analyse how long it takes for soreness levels to normalise between individuals.

A graph was generated to plot mean match-day soreness and HSR (m/min) with altitude.

ggplot(soreness_1, aes(x = mean_soreness_1, y = mean_hs_rmin, color = altitude_code)) + 
  geom_point(size = 3, alpha = 0.8) +  # Scatter points
  geom_smooth(method = "lm", se = FALSE) +  # Add trend lines per altitude
  scale_color_manual(
    values = c(Sea = "#082143", Low = "#283FCB", Medium = "#E6D23E")
  ) +
  theme_minimal() +
  labs(
    x = "Mean Soreness Match Day +1",
    y = "Mean High-Speed Running (m/min)",
    color = "Altitude Level",
    title = "Relationship Between Soreness and High-Speed Running at Different Altitudes"
  )

`geom_smooth()` using formula = 'y ~ x'

The trend lines show that for moderate altitude and sea level, the higher HSR (m/min) output, the higher post-match soreness however, at low altitude the opposite was observed, with the greater HSR (m/min) relating to lower soreness reporting.

From the code and findings above, I then generated a interactive dashboard to display to the coaches to provide the key messages.

# UI ----
ui <- dashboardPage(
  dashboardHeader(title = "Matchday Altitude Dashboard"),
  dashboardSidebar(
    sidebarMenu(
      menuItem("Boxplot Analysis", tabName = "boxplot", icon = icon("chart-bar")),
      menuItem("Lollipop Graph", tabName = "lollipop", icon = icon("chart-line"))
    )
  ),
  
  dashboardBody(
    tabItems(
      
      # 📌 Boxplot Tab ----
      tabItem(tabName = "boxplot",
              selectInput("view_option", "View Option:", 
                          choices = c("Single Metric", "Both Metrics"), 
                          selected = "Single Metric"),
              
              selectInput("metric", "Select Metric:", 
                          choices = c("HSR per min" = "hsr", "Soreness" = "soreness"), 
                          selected = "hsr"),
              
              fluidRow(
                box(plotOutput("altitude_plot"), width = 12)
              ),
              
              # Conditional Summary Box (Only for Both Metrics)
              conditionalPanel(
                condition = "input.view_option == 'Both Metrics'",
                box(
                  title = "What Impact does Altitude have on Matchday HSR (m/min) and Soreness?",
                  width = 12,
                  textOutput("summary_text")
                )
              )
      ),
      
      
      # 📌 Lollipop Graph Tab ----
      tabItem(tabName = "lollipop",
              selectInput("view_option_lollipop", "View Option:", 
                          choices = c("Single Metric", "Both Metrics"), 
                          selected = "Single Metric"),
              
              selectInput("lollipop_metric", "Select Metric:", 
                          choices = c("HSR per min" = "hsr", "Soreness" = "soreness"), 
                          selected = "hsr"),
              
              # Single Metric Lollipop Plot
              conditionalPanel(
                condition = "input.view_option_lollipop == 'Single Metric'",
                box(plotOutput("lollipop_plot"), width = 12)
              ),
              
              # Both Metrics Lollipop Plots Side by Side
              conditionalPanel(
                condition = "input.view_option_lollipop == 'Both Metrics'",
                fluidRow(
                  box(plotOutput("lollipop_plot_hsr"), width = 6),
                  box(plotOutput("lollipop_plot_soreness"), width = 6)
                ),
                # Conditional Summary Box for Lollipop Graph
                box(
                  title = "What Impact does Altitude have on Matchday HSR (m/min) and Soreness Across Individuals?",
                  width = 12,
                  textOutput("summary_text_lollipop")
                )
              )
      )
    )
  )
)

# Server ----
server <- function(input, output) {
  
  # Convert data to long format for boxplot
  soreness_1_long <- soreness_1 %>%
    pivot_longer(cols = c(mean_hs_rmin, mean_soreness_1), names_to = "Metric", values_to = "Value") %>%
    mutate(Metric = recode(Metric, mean_hs_rmin = "HSR (m/min)", mean_soreness_1 = "Soreness (1-10)"))
  
  output$altitude_plot <- renderPlot({
    if (input$view_option == "Single Metric") {
      metric_column <- ifelse(input$metric == "hsr", "mean_hs_rmin", "mean_soreness_1")
      metric_label <- ifelse(input$metric == "hsr", "Matchday Mean High-Speed Running (m/min)", "Matchday Mean Soreness")
      
      ggplot(soreness_1, aes(x = altitude_code, y = .data[[metric_column]], fill = altitude_code)) +
        geom_boxplot(outlier.shape = NA) +  
        geom_point(color = "red", size = 2, alpha = 0.8, position = position_nudge(x = 0)) +  
        scale_fill_manual(values = c(Sea = "#011925", Low  = "#418cdd", Medium = "#c3a871")) +
        scale_x_discrete(labels = c("Sea" = "Sea", "Low" = "Low (1000-2000m)", "Medium" = "Medium (>2000m)")) + 
        theme_classic() + theme(legend.position = "none") +
        labs(
          x = "Altitude Category",
          y = metric_label,  
          fill = "Altitude Level",
          title = paste("Impact of Altitude on", metric_label)
        )
    } else {
      ggplot(soreness_1_long, aes(x = altitude_code, y = Value, fill = altitude_code)) +
        geom_boxplot(outlier.shape = NA) +  
        geom_point(color = "red", size = 2, alpha = 0.8, position = position_nudge(x = 0)) +  
        facet_wrap(~ Metric, scales = "free") +
        scale_fill_manual(values = c(
            "Sea" = "#011925",
            "Low" = "#418cdd",
            "Medium" = "#c3a871"
          )) +
        scale_x_discrete(labels = c("Sea" = "Sea (0-1000m)", "Low" = "Low (1000-2000m)", "Medium" = "Medium (>2000m)")) + 
        theme_classic() + theme(legend.position = "none") +
        labs(
          x = "Altitude Category",
          y = "",  
          fill = "Altitude Level",
          title = "Impact of Altitude on Matchday HSR (m/min) and Soreness"
        )
    }
  })
  
  # Compute individual differences for Lollipop Graph
  individual_differences <- soreness_1 %>%
    group_by(id) %>%
    summarize(
      hsr_diff = mean(mean_hs_rmin[altitude_code != "Sea"], na.rm = TRUE) - mean(mean_hs_rmin[altitude_code == "Sea"], na.rm = TRUE),
      soreness_diff = mean(mean_soreness_1[altitude_code != "Sea"], na.rm = TRUE) - mean(mean_soreness_1[altitude_code == "Sea"], na.rm = TRUE)
    ) %>%
    pivot_longer(cols = c(hsr_diff, soreness_diff), names_to = "Metric", values_to = "Difference") %>%
    mutate(Metric = recode(Metric, hsr_diff = "HSR per min", soreness_diff = "Soreness"))
  
  render_lollipop <- function(metric_label) {
    plot_data <- individual_differences %>%
      filter(Metric == metric_label, !is.na(Difference))
    
    ggplot(plot_data, aes(x = Difference, y = reorder(id, Difference))) +
      geom_rect(aes(xmin = -1, xmax = 1, ymin = -Inf, ymax = Inf), fill = "gray80", alpha = 0.3) +
      geom_segment(aes(xend = 0, yend = id), color = "black") +
      geom_point(size = 4, color = "red") +
      geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
      theme_classic() + theme(legend.position = "none") +
      labs(x = "Difference from Sea Level (0-1000m) to Altitude (>1000m)", y = "Player ID",  title = ifelse(metric_label == "HSR per min", 
                                                                                                            "Individual Differences in Matchday HSR (m/min) from Sea Level to Altitude", 
                                                                                                            paste("Individual Differences in", metric_label, "from Sea Level to Altitude")))
  }
  
  # Render Single Lollipop Graph
  output$lollipop_plot <- renderPlot({
    render_lollipop(ifelse(input$lollipop_metric == "hsr", "HSR per min", "Soreness"))
  })
  
  # Render Both Lollipop Graphs Side by Side
  output$lollipop_plot_hsr <- renderPlot({ render_lollipop("HSR per min") })
  output$lollipop_plot_soreness <- renderPlot({ render_lollipop("Soreness") })
  
  # Create summary text for findings (Boxplot)
  output$summary_text <- renderText({
    if (input$view_option == "Both Metrics") {
      return("The Boxplots show no significant differences in matchday high-speed running (m/min) across the altitude categories, suggesting altitude has minimal impact on matchday running output. However, next-day soreness ratings were lower at medium altitudes (>2000m), decreasing from approximately a 6/10 to a 3/10. This indicates players experienced greater muscle soreness at altitudes above 2000m. On average, soreness scores decreased by 3.15 AU on the Likert Scale at medium altitude, which exceeds the smallest worthwhile change. Furthermore, this reduction is estimated to lie between 2.23 and 4.02 AU, suggesting a meaningful impact. This increased soreness (observed via a reduction is score on the Likert Scale) may reflect greater muscle stress due to lower atmospheric pressure and reduced oxygen availability. These findings highlight the importance of enhanced recovery strategies - such as extended recovery time - when playing at higher altitudes. Overall, the data suggests that while altitude may not drastically alter match-day running performance, it can influence fatigue and recovery.")
    } else {
      metric_label <- ifelse(input$metric == "hsr", "HSR per min", "Soreness")
      return(paste("The analysis for", metric_label, "at different altitudes suggests the impact of altitude on the metric. Further detailed analysis is required to draw conclusions on the impact at different altitude levels."))
    }
  })
  
  # Create summary text for findings (Lollipop)
  output$summary_text_lollipop <- renderText({
    if (input$view_option_lollipop == "Both Metrics") {
      return("When exploring individual responses, 6 out of 10 players showed either an increase in matchday HSR (m/min) accompanied by an increased soreness, or a decrease in both HSR (m/min) and soreness at altitude. In contrast, the remaining 4 players exhibited mismatched trends, displaying increased soreness despite a reduction in HSR (m/min) at altitude. Moreover, it is important to note no player increased in HSR (m/min) or reported less soreness outside of the smallest worthwhile change (+1) whereas the magnitude of HSR(m/min) decrease and soreness increase was much larger. This highlights the individual variability in adaptation to altitude and the importance of monitoring both internal and external load, further supporting the importance of obtaining soreness data at altitude.")
    } else {
      metric_label <- ifelse(input$lollipop_metric == "hsr", "HSR per min", "Soreness")
      return(paste("The analysis of", metric_label, "individual differences at different altitudes suggests significant changes in performance and discomfort. Further exploration may be needed to assess the implications of these changes at different altitudes."))
    }
  })
}

# Run App ----
shinyApp(ui = ui, server = server)

Shiny applications not supported in static R Markdown documents

In relation to the lollipop graphs displayed in this dashboard, the grey shaded region spans from -1 to 1, representing the zone of trivial change - a range within which differences are not considered practically meaningful. This is based on the concept of the smallest worthwhile change (SWC), which, for performance, is often set at 0.2 times the between-subject standard deviation (Fang and Ho 2020). By using a fixed range such as -1 to 1, we provide a visual threshold for interpreting whether observed differences exceed what could be attributed to natural variability or measurement noise.

Research has shown that match-to-match variability in high-speed running is approximately 16% (Gregson et al. 2010). Therefore, to calculate the SWC in high-speed running, 16% of the mean HSR was used, resulting in a value of 0.93. As a result, a threshold of ±1 was applied.

In relation to soreness, the SWC was calculated from taking the mean match-day soreness over the 3 altitudes and multiplying it by 0.2 ((6.28 + 6.27 + 3.12) /3 = 5.22 x 0.2 = 1.04. Therefore, a threshold of ±1 was utilised for soreness.

From the lollipop graphs it is important to note no player increased in HSR (m/min) or reported less soreness outside of the smallest worthwhile change (+1) whereas the magnitude of HSR(m/min) decrease and soreness increase was much larger. The data suggests that whilst there wasn’t a meaningful improvement in performance or recovery (soreness reduction), some players experienced a notable decline in performance (HSR) and an increase in soreness at higher altitudes. This could indicate that altitude negatively affects some players’ physical performance and recovery, but no positive changes (increase in HSR or decrease in soreness) were observed beyond the minimal threshold.

Practical Application:

Impact of Match-Day Performance

High-Speed Running (m/min) shows no significant difference across altitude categories, suggesting that performance during match play is not heavily impacted by altitude.
Recovery and Soreness

Soreness ratings were lower at moderate altitudes (>2000m), indicating increased muscle fatigue and soreness. This could be attributed to lower atmospheric pressure and reduced oxygen availability at higher altitudes.
Altitude and Recovery Strategies

Although altitude may not cause a drastic decline in performance, it’s influence on post-match soreness and overall readiness highlights the need for enhanced recovery protocols - such as extended recovery periods and adjusted training loads - for athletes competing at higher altitudes. Effectively managing post-match soreness, through constant data collection, is particularly important in these environments to maintain performance and reduce injury risk.
Further Research

These findings highlight the need for further research into the long-term impacts of training and competing at higher altitudes, alongside the development of effective recovery strategies to optimise athlete performance and well-being in such challenging environments. Additionally, future research should explore post-match soreness trends over subsequent days to better understand how altitude influences recovery and the return to baseline levels.

References

Aughey, Robert J, Kristal Hammond, Matthew C Varley, Walter F Schmidt, Pitre C Bourdon, Martin Buchheit, Ben Simpson, et al. 2013. “Soccer Activity Profile of Altitude Versus Sea-Level Natives During Acclimatisation to 3600m (ISA3600).” British Journal of Sports Medicine 47 (Suppl 1): i107–13. https://doi.org/10.1136/bjsports-2013-092776.

Chapman, Robert F., James Stray-Gundersen, and Benjamin D. Levine. 1998. “Individual Variation in Response to Altitude Training.” Journal of Applied Physiology 85 (4): 1448–56. https://doi.org/10.1152/jappl.1998.85.4.1448.

Chen, Pai-Sheng, Wen-Tai Chiu, Pei-Ling Hsu, Shih-Chieh Lin, I-Chen Peng, Chia-Yih Wang, and Shaw-Jenq Tsai. 2020. “Pathophysiological Implications of Hypoxia in Human Diseases.” Journal of Biomedical Science 27 (1). https://doi.org/10.1186/s12929-020-00658-7.

Draper, Garrison, Matthew D. Wright, Ai Ishida, Paul Chesterton, Matthew Portas, and Greg Atkinson. 2022. “Do Environmental Temperatures and Altitudes Affect Physical Outputs of Elite Football Athletes in Match Conditions? A Systematic Review of the ‘Real World’ Studies.” Science and Medicine in Football 7 (1): 81–92. https://doi.org/10.1080/24733938.2022.2033823.

Edwards, Lindsay M., Andrew J. Murray, Damian J. Tyler, Graham J. Kemp, Cameron J. Holloway, Peter A. Robbins, Stefan Neubauer, et al. 2010. “The Effect of High-Altitude on Human Skeletal Muscle Energetics: 31P-MRS Results from the Caudwell Xtreme Everest Expedition.” Edited by Conrad P. Earnest. PLoS ONE 5 (5): e10681. https://doi.org/10.1371/journal.pone.0010681.

Fang, Hua, and Indy Man Kit Ho. 2020. “Intraday Reliability, Sensitivity, and Minimum Detectable Change of National Physical Fitness Measurement for Preschool Children in China.” Edited by Subas Neupane. PLOS ONE 15 (11): e0242369. https://doi.org/10.1371/journal.pone.0242369.

Faude, Oliver, Thorsten Koch, and Tim Meyer. 2012. “Straight Sprinting Is the Most Frequent Action in Goal Situations in Professional Football.” Journal of Sports Sciences 30 (7): 625–31. https://doi.org/10.1080/02640414.2012.665940.

GORE, CHRISTOPHER JOHN, SALLY A. CLARK, and PHILO U. SAUNDERS. 2007. “Nonhematological Mechanisms of Improved Sea-Level Performance After Hypoxic Exposure.” Medicine & Science in Sports & Exercise 39 (9): 1600–1609. https://doi.org/10.1249/mss.0b013e3180de49d3.

Gregson, W., B. Drust, G. Atkinson, and V. Salvo. 2010. “Match-to-Match Variability of High-Speed Activities in Premier League Soccer.” International Journal of Sports Medicine 31 (04): 237–42. https://doi.org/10.1055/s-0030-1247546.

Gualtieri, Antonio, Ermanno Rampinini, Antonio Dello Iacono, and Marco Beato. 2023. “High-Speed Running and Sprinting in Professional Adult Soccer: Current Thresholds Definition, Match Demands and Training Strategies. A Systematic Review.” Frontiers in Sports and Active Living 5 (February). https://doi.org/10.3389/fspor.2023.1116293.

Noakes, Timothy D. 2007. “The Central Governor Model of Exercise Regulation Applied to the Marathon.” Sports Medicine 37 (4): 374–77. https://doi.org/10.2165/00007256-200737040-00026.

Peronnet, F., G. Thibault, and D. L. Cousineau. 1991. “A Theoretical Analysis of the Effect of Altitude on Running Performance.” Journal of Applied Physiology 70 (1): 399–404. https://doi.org/10.1152/jappl.1991.70.1.399.

Rojas-Valverde, Daniel, Jose Alexis Ugalde Ramírez, Braulio Sánchez-Ureña, and Randall Gutiérrez-Vargas. 2019. “Influence of Altitude and Environmental Temperature on Muscle Functional and Mechanical Activation After 30’ Time Trial Run.” MHSalud: Revista En Ciencias Del Movimiento Humano y Salud 17 (1): 1–15. https://doi.org/10.15359/mhs.17-1.2.

Sparks, Martinique, Ben Coetzee, and J. Tim Gabbett. 2016. “Variations in High-Intensity Running and Fatigue During Semi-Professional Soccer Matches.” International Journal of Performance Analysis in Sport 16 (1): 122–32. https://doi.org/10.1080/24748668.2016.11868875.