Does Altitude Effect Match-Day Physical Performance and Post-Match Soreness of Elite Footballers?

The data presented in this report were collected over four Major League Soccer (MLS) seasons and includes data from 19 outfield players from Philadelphia Union. This report investigates the impact of three altitudes classifications - (Sea (0-100m), Low (1000-2000m), Medium (>2000m) - on match-day physical outputs, with a particular focus on high-speed running (HSR) and post-match soreness.

The impact of altitude on match-day performance and post-match soreness is underpinned by well-established physiological responses to hypoxia. Reduced oxygen availability at altitude lowers arterial oxygen saturation, which impairs aerobic energy production and limits the capacity for sustained high-intensity efforts (GORE, CLARK, and SAUNDERS 2007). As a result, athletes may rely more heavily on anaerobic pathways, leading to greater lactate accumulation, increased neuromuscular fatigue and heightened delayed onset muscle soreness(Chapman, Stray-Gundersen, and Levine 1998). Acute responses, without time to acclimatise, has been shown to reduce total distance covered and sprint frequency in elite footballers during match play (Aughey et al. 2013). The Central Governor Theory also posits that the brain may downregulate muscle recruitment in hypoxic conditions to protect homeostasis, further affecting the output (Noakes 2007).

The first step of analysis was to install the necessary packages.

#installing packages
library(ggplot2)
library(readxl)
library(googlesheets4)
library(janitor)
library(tidyverse)
library(lme4)
library(emmeans)
library(dplyr)
library(esquisse)
library(effectsize)
library(shinydashboard)
library(shiny)
library(kableExtra)
library(readr)
library(rsconnect)

Set up working directory.

setwd("~/Library/CloudStorage/OneDrive-Personal/Teesside University/Semester 2/R Studio/Work Area/Assessment")

Imported data.

#import data
data <- read_csv("Assessment Data.csv")
    view(data)

Column names cleaned to put them all in a consistent format by removing special characters, spaces and uppercase letters - helps make the data look tidier and more consistent to help avoid errors.

  data <- clean_names(data)

As the below variables are categorical they were changed to factors.

 colnames(data)

 [1] "id"                 "date"               "altitude_code"     
 [4] "gd"                 "replication"        "timein_environment"
 [7] "altitude"           "duration"           "distance"          
[10] "distancemin"        "high_speed_running" "hs_rmin"           
[13] "player_load"        "playerloadmin"      "hr_zone3time"      
[16] "hr_zone4time"       "hr_zone5time"       "hr_zone6time"      
[19] "h_rzone_trimp"      "soreness_1"         "mood_1"            
[22] "nutrition_1"        "rpe_leg"            "rpe_breathe"       
[25] "rpe_tech"           "rpe_session"

   data$id <- as.factor(data$id)  
   #data$date <- as.Date(data$date,format="%m-%d-%Y") #date being an issue
   data$altitude_code<- as.factor(data$altitude_code)
   data$gd<- as.factor(data$gd)
   data$replication<- as.factor(data$replication)
   
   view(data)

The names of the altitude levels were changed from 1,2,3 to help with clarity and prevent confusion.

data <- data %>%           #renamed altitude levels for clarity
     mutate(altitude_code = recode(altitude_code, 
                                   `1` = "Sea", 
                                   `2` = "Low", 
                                   `3` = "Medium"))

The same was done with the training codes.

data <- data %>%           #renamed game day for clarity
     mutate(gd = recode(gd, 
                                   `0` = "MD", 
                                   `1` = "MD-1", 
                                   `2` = "MD-2"))

Data that was not needed was immediately removed. Heart rate had lots of missing data making it difficult to utilise.

#wrangle
    data <- data %>%     #removed heart rate data as was very sparse with lots of missing values
     select(-c(hr_zone3time, hr_zone4time, hr_zone5time, hr_zone6time, h_rzone_trimp))
   view(data)

Once the above steps were completed, I decided on the variables to analyse.

Through analysis of existing research, it is evident that previous studies have shown altitude exposure to reduce total distance and sprint frequency in footballers (Aughey et al. 2013). Consequently, I was interested in exploring whether a similar trend would emerge in relation to HSR.

I wanted to visualise and see if altitude had any effect on HSR (m/min) at the different altitude levels. HSR was also selected due to its importance in team sports and is considered a key determinant for successful performance (Gualtieri et al. 2023). HSR (m/min) was utilised over HSR (m) as HSR (m/min) is a relative measure and therefore accounts for, and enables, fair comparisons across players with different playing duration.

ggplot(data %>% filter(gd == "MD")) + #filter for only match days 
     aes(x = altitude_code, y = hs_rmin, fill = altitude_code) + #chose hs_rmin to diminish the effects of minutes played. 
     geom_boxplot() +
     geom_point( colour = "red") + #adding in the data points and colouring red
     scale_fill_manual(
       values = c(Sea = "#011925",
                  Low = "#418cdd",
                  Medium = "#c3a871")
     ) +
     theme_classic() +
     labs(
       x = "Altitude Category",  # Rename x-axis label
       y = "High-Speed Running (m/min)",  # Rename y-axis label
       fill = "Altitude Level",  # Rename legend title
       title = "The Impact of Altitude on High-Speed Running (m/min) Performance"
     )

  match_data<- data %>% #rename dataset
    filter(gd == "MD")

Following this, I then wanted to analyse the same variables, HSR (m/min) and altitude, but whilst taking into account different participants in each category.

To analyse and interpret this data I used a linear model to assess the relationship between altitude exposure and various performance metrics, such as HSR and soreness (later in the analysis). This approach is widely used in sport science to account for continuous variables and control for confounders. Following the linear model I applied estimated marginal means (EMM) to compare the adjusted means across conditions which allowed for a more accurate interpretation of the effects of altitude on performance. This method is particularly useful for estimating differences between groups while accounting for other covariates, ensuring robust and reliable results.

#analysing HSR at different altitude levels whilst taking into account different people in each category
  l_model1 <- lmer(hs_rmin ~ altitude_code + (1 | id), match_data )
   summary(l_model1)

Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ altitude_code + (1 | id)
   Data: match_data

REML criterion at convergence: 167.5

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.59107 -0.56127 -0.05725  0.47690  2.02692 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 3.737    1.933   
 Residual             1.620    1.273   
Number of obs: 42, groups:  id, 20

Fixed effects:
                    Estimate Std. Error t value
(Intercept)           6.2710     0.5397  11.619
altitude_codeLow     -0.9842     0.5124  -1.921
altitude_codeMedium  -0.3737     0.6232  -0.600

Correlation of Fixed Effects:
            (Intr) altt_L
altitd_cdLw -0.360       
alttd_cdMdm -0.315  0.250

 # comparing between altitudes   
   emm <- emmeans(l_model1, pairwise ~ altitude_code)
  
  #confidence interval (95%) from estimated marginal means
  confint(emm)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.27 0.543 25.5     5.15     7.39
 Low             5.29 0.600 31.9     4.06     6.51
 Medium          5.90 0.692 37.5     4.50     7.30

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df lower.CL upper.CL
 Sea - Low       0.984 0.521 24.5   -0.314     2.28
 Sea - Medium    0.374 0.636 25.8   -1.207     1.95
 Low - Medium   -0.610 0.718 26.9   -2.390     1.17

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 3 estimates

  print (emm)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.27 0.543 25.5     5.15     7.39
 Low             5.29 0.600 31.9     4.06     6.51
 Medium          5.90 0.692 37.5     4.50     7.30

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df t.ratio p.value
 Sea - Low       0.984 0.521 24.5   1.890  0.1628
 Sea - Medium    0.374 0.636 25.8   0.588  0.8279
 Low - Medium   -0.610 0.718 26.9  -0.851  0.6752

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates

From the outputs provided from the code above, the following observations can be made:

sea-Low: players at low altitude had 0.984 m/min less HSR than at sea level (P=0.1628)

Sea-High: HSR at medium altitude was 0.374 m/min lower than at sea level (P=0.8279)

Low-High: HSR at medium altitude was 0.610 m/min higher at low altitude (P=0.6752)

In conclusion, there was no significant differences between altitude category and HSR (m/min). These findings align with previous research that reports altitude has little to no effect on HSR (Draper et al. 2022).

Following these findings, individual trends were investigated by plotting each player’s HSR (m/min) against altitude.

#Checking individual trends by plotting each player’s HSR vs. altitude
  ggplot(match_data, aes(x = altitude_code, y = hs_rmin, group = id, color = id)) +
    geom_line() +
    geom_point() +
    theme_classic() +
    labs(title = "Player-Specific Trends in HSR Across Altitudes",
         x = "Altitude Category",
         y = "High-Speed Running (m/min)") +
    theme(legend.position = "none")

This graph wasn’t overly useful due to the small number of individuals who had dater reported for all three altitudes. A key limitation of the data used in this report is the lack of completeness, which restricts the depth of analysis. To strengthen future investigations, increasing participant numbers and ensuring consistent data compliance is essential.

Following all the above findings, I decided to combine low and medium altitude together to compare data at sea level to altitude (>1000m) to try reduce noise and make the comparisons more focused.

 #combining low and high altitude to compare to sea level to reduce noise and make comparison more focused
  match_data <- match_data %>%
    mutate(altitude_group = ifelse(altitude_code == "Sea", "Sea", "Altitude"))
  view(data)

I repeated the graph previously presented but this time looking at sea level and altitude as opposed to three different altitude categories.

#boxplot to show HSR(m/min) at sea level and altitude
 ggplot(match_data, aes(x = altitude_group, y = hs_rmin, fill = altitude_group)) +
    geom_boxplot() +
    geom_point(color = "red") +
    scale_fill_manual(
      values = c("Sea" = "#011925", "Altitude" = "#c3a871")  # Sea and Altitude colors
    ) +
    theme_classic() +
    theme(legend.position = "none") +  # Remove the legend
    labs(
      x = "Altitude Group",
      y = "High-Speed Running (m/min)",
      title = "Comparison of HSR/min Between Sea Level and Altitude"
    )

Following this, analysis of HSR (m/min) and altitude (sea and altitude) was carried out but whilst taking into account different participants in each category.

#linear mix model with new altitude groups 
   l_model2 <- lmer(hs_rmin ~ altitude_group + (1 | id), data = match_data)
  summary(l_model2)

Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ altitude_group + (1 | id)
   Data: match_data

REML criterion at convergence: 169.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.3712 -0.5724 -0.1238  0.4642  1.9179 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 3.567    1.889   
 Residual             1.648    1.284   
Number of obs: 42, groups:  id, 20

Fixed effects:
                  Estimate Std. Error t value
(Intercept)         5.5190     0.5242  10.529
altitude_groupSea   0.7564     0.4439   1.704

Correlation of Fixed Effects:
            (Intr)
altitd_grpS -0.403

  #run stats test to see if there's significance for any individual between sea level and altitude
  #shows altitude is not changing within player HSR (m/min)?? Altitude is not affecting HSR (m/min). 
  l_model3 <- lmer(hs_rmin ~ 1 + (1 | id), data = match_data)
  summary(l_model3)

Linear mixed model fit by REML ['lmerMod']
Formula: hs_rmin ~ 1 + (1 | id)
   Data: match_data

REML criterion at convergence: 172.4

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.68343 -0.57864 -0.07324  0.53997  2.03570 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 3.472    1.863   
 Residual             1.792    1.339   
Number of obs: 42, groups:  id, 20

Fixed effects:
            Estimate Std. Error t value
(Intercept)   5.8744     0.4791   12.26

  # comparing between sea level and altitude
  emm2 <- emmeans(l_model2, pairwise ~ altitude_group)
  emm2

$emmeans
 altitude_group emmean    SE   df lower.CL upper.CL
 Altitude         5.52 0.526 25.1     4.44     6.60
 Sea              6.28 0.536 26.0     5.17     7.38

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate   SE   df t.ratio p.value
 Altitude - Sea   -0.756 0.45 25.4  -1.679  0.1053

Degrees-of-freedom method: kenward-roger

  #confidence interval (95%) from estimated marginal means
  confint(emm2)

$emmeans
 altitude_group emmean    SE   df lower.CL upper.CL
 Altitude         5.52 0.526 25.1     4.44     6.60
 Sea              6.28 0.536 26.0     5.17     7.38

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate   SE   df lower.CL upper.CL
 Altitude - Sea   -0.756 0.45 25.4    -1.68    0.171

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

On average players ran 0.756 m/min less at altitude compared to sea level however this difference is not statistically significant (P=0.1053). As the confidence interval crosses zero, it further reinforces that the effect is not statistically significant. Although P=0.1053 and therefore isn’t significant (P>0.05), it is closer to being significant than when looked at over 3 altitudes. This suggests altitude may have some effect on HSR (m/min) but the current sample size or variability in data is preventing it reaching statistical significance.

Similar to before, I then checked the individual trends among the players.

 #Checking individual trends by plotting each player’s HSR vs. altitude
  ggplot(match_data, aes(x = altitude_group, y = hs_rmin, group = id, color = id)) +
    geom_line(alpha = 0.6) +   # Connects points for each player
    geom_point(size = 3) +     # Shows individual data points
    theme_classic() +
    labs(title = "Individual HSR Trends Across Altitude",
         x = "Altitude Category",
         y = "High-Speed Running (m/min)") +
    theme(legend.position = "none")

Research shows that reductions in HSR are commonly observed during the second half of matches (Sparks, Coetzee, and Gabbett 2016). Consequently, further research and analysis should consider separating HSR data by halves to provide more detailed insights. This approach could also support conclusions regarding the impact of fatigue, particularly if rating of perceived exertion were collected more frequently during match play.

Furthermore, research findings demonstrate that high-intensity actions, such as HSR, are key determinants of goals scoring opportunities (Faude, Koch, and Meyer 2012) and therefore, if match results were provided, further research could have investigated if match outcome was affected by HSR outputs.

Following the findings above, and due to the paucity in research, the effects of altitude on post-match soreness were investigated.

Mean soreness and mean HSR (m/min) per player for a match days were calculated.

 match_data <- data %>% 
    filter(gd == "MD")  # Keep only Match Day data
  
  soreness_1 <- match_data %>%
    group_by(id, altitude_code) %>%  # Grouped by player ID and altitude code
    summarize(
      mean_soreness_1 = mean(soreness_1, na.rm = TRUE),  # Mean soreness per player
      mean_hs_rmin = mean(hs_rmin, na.rm = TRUE)  # Mean high-speed running per player
    )

`summarise()` has grouped output by 'id'. You can override using the `.groups`
argument.

summary(soreness_1)

       id     altitude_code mean_soreness_1  mean_hs_rmin   
 10     : 3   Sea   :14     Min.   :2.000   Min.   : 2.611  
 109    : 3   Low   :11     1st Qu.:4.000   1st Qu.: 4.478  
 2      : 2   Medium: 8     Median :6.000   Median : 5.668  
 8      : 2                 Mean   :5.462   Mean   : 5.795  
 27     : 2                 3rd Qu.:6.833   3rd Qu.: 6.875  
 64     : 2                 Max.   :8.000   Max.   :11.778  
 (Other):19                 NA's   :2

The effect of altitude on soreness over the three different altitude groups was then visually displayed.

# Create a boxplot comparing soreness across all three altitude groups (Sea, Low, Medium)
ggplot(soreness_1, aes(x = altitude_code, y = mean_soreness_1, fill = altitude_code)) +
  geom_boxplot() +  # Create the boxplot
  geom_point(color = "red", size = 2, position = position_dodge(width = 0.75)) +  # Add red points in a straight line
  scale_fill_manual(
    values = c("Sea" = "#011925", 
               "Low" = "#418cdd", 
               "Medium" = "#c3a871")  # Assign different colors to each altitude
  ) +
  theme_classic() +  # Use classic theme
  theme(legend.position = "none") +  # Remove the legend
  labs(
    x = "Altitude Code",  # Label for the x-axis
    y = "Mean Soreness",  # Label for the y-axis
    fill = "Altitude Group",  # Legend title
    title = "Comparison of Soreness Across Sea, Low, and Medium Altitudes"
  )

Statistical tests were then ran to determine whether the data showed real, meaningful patterns and not just random noise.

# Fit a mixed-effects model (altitude_code as fixed effect, id as random effect)
lm_model_mixed <- lmer(mean_soreness_1 ~ altitude_code + (1 | id), data = soreness_1)

boundary (singular) fit: see help('isSingular')

summary(lm_model_mixed)

Linear mixed model fit by REML ['lmerMod']
Formula: mean_soreness_1 ~ altitude_code + (1 | id)
   Data: soreness_1

REML criterion at convergence: 95.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.7707 -0.2348 -0.1057  0.6148  1.5849 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.0      0.000   
 Residual             1.4      1.183   
Number of obs: 31, groups:  id, 19

Fixed effects:
                     Estimate Std. Error t value
(Intercept)          6.277778   0.341506  18.383
altitude_codeLow    -0.005051   0.493817  -0.010
altitude_codeMedium -3.152778   0.539968  -5.839

Correlation of Fixed Effects:
            (Intr) altt_L
altitd_cdLw -0.692       
alttd_cdMdm -0.632  0.437
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

# Calculate the estimated marginal means (EMMs) for each altitude code
emm_result <- emmeans(lm_model_mixed, pairwise ~ altitude_code)
summary(emm_result)

$emmeans
 altitude_code emmean    SE df lower.CL upper.CL
 Sea             6.28 0.352 28     5.56     7.00
 Low             6.27 0.369 28     5.52     7.03
 Medium          3.12 0.439 28     2.23     4.02

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df t.ratio p.value
 Sea - Low     0.00505 0.509 19.5   0.010  0.9999
 Sea - Medium  3.15278 0.562 23.6   5.606  <.0001
 Low - Medium  3.14773 0.575 25.1   5.471  <.0001

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates

# Confidence intervals (95%) for the EMMs
confint(emm_result)

$emmeans
 altitude_code emmean    SE df lower.CL upper.CL
 Sea             6.28 0.352 28     5.56     7.00
 Low             6.27 0.369 28     5.52     7.03
 Medium          3.12 0.439 28     2.23     4.02

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df lower.CL upper.CL
 Sea - Low     0.00505 0.509 19.5    -1.28     1.29
 Sea - Medium  3.15278 0.562 23.6     1.75     4.56
 Low - Medium  3.14773 0.575 25.1     1.71     4.58

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 3 estimates

print(emm_result)

$emmeans
 altitude_code emmean    SE df lower.CL upper.CL
 Sea             6.28 0.352 28     5.56     7.00
 Low             6.27 0.369 28     5.52     7.03
 Medium          3.12 0.439 28     2.23     4.02

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast     estimate    SE   df t.ratio p.value
 Sea - Low     0.00505 0.509 19.5   0.010  0.9999
 Sea - Medium  3.15278 0.562 23.6   5.606  <.0001
 Low - Medium  3.14773 0.575 25.1   5.471  <.0001

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates

From the tests it is clear to see that the difference from sea to medium and low to medium is statistically significant (P<0.0001), whilst the changes from sea to low is not statistically significant (P=0.9999). Further to this, the estimated marginal means suggest that both sea level and low altitude have similar average soreness levels (6.28 and 6.27, respectively), whilst medium altitude shows a significantly lower estimated soreness level of 3.12. This suggests that playing at altitudes above 2000m impacts soreness.

Similar with HSR, low and medium altitude were then combined to form ‘altitude’ so direct comparison of sea level to altitude could be determined.

# Recode altitude_code to combine 'Low' and 'Medium' into 'Altitude', keeping 'Sea' as 'Sea'
soreness_1_combined <- soreness_1 %>%
  mutate(altitude_code = ifelse(altitude_code == "Sea", "Sea", "Altitude"))

I then viewed this as a boxplot.

# Reorder factor levels so that 'Sea' is on the left and 'Altitude' is on the right
soreness_1_combined <- soreness_1_combined %>%
  mutate(altitude_code = factor(altitude_code, levels = c("Sea", "Altitude")))

# Create a boxplot comparing soreness between Sea and combined Altitude (Low and Medium)
ggplot(soreness_1_combined, aes(x = altitude_code, y = mean_soreness_1, fill = altitude_code)) +
  geom_boxplot() +  # Create the boxplot
  geom_point(color = "red", size = 2, position = position_dodge(width = 0.75)) +  # Add red points in a straight line
  scale_fill_manual(
    values = c("Sea" = "#011925", 
               "Altitude" = "#c3a871")  # Combine Low and Medium into 'Altitude'
  ) +
  theme_classic() +  # Use classic theme
  labs(
    x = "Altitude Group",  # Label for the x-axis
    y = "Mean Soreness",  # Label for the y-axis
    fill = "Altitude Group",  # Legend title
    title = "Comparison of Soreness Between Sea Level and Altitude (Low and Medium Combined)"
  )

I ran some more statistical tests.

# Fit a mixed-effects model (altitude_code as fixed effect, id as random effect)
lm_model_combined_random <- lmer(mean_soreness_1 ~ altitude_code + (1 | id), data = soreness_1_combined)

boundary (singular) fit: see help('isSingular')

summary(lm_model_combined_random)

Linear mixed model fit by REML ['lmerMod']
Formula: mean_soreness_1 ~ altitude_code + (1 | id)
   Data: soreness_1_combined

REML criterion at convergence: 118.9

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.91369 -0.55311  0.03073  0.61457  1.78224 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.000    0.000   
 Residual             2.934    1.713   
Number of obs: 31, groups:  id, 19

Fixed effects:
                      Estimate Std. Error t value
(Intercept)             6.2778     0.4944  12.697
altitude_codeAltitude  -1.3304     0.6316  -2.107

Correlation of Fixed Effects:
            (Intr)
alttd_cdAlt -0.783
optimizer (nloptwrap) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

# Calculate the estimated marginal means (EMMs) for the combined altitude groups (Sea vs. Altitude)
emm_result_combined <- emmeans(lm_model_combined_random, pairwise ~ altitude_code)
summary(emm_result_combined)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.28 0.509 29.0     5.24     7.32
 Altitude        4.95 0.408 25.8     4.11     5.79

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate    SE df t.ratio p.value
 Sea - Altitude     1.33 0.651 21   2.045  0.0537

Degrees-of-freedom method: kenward-roger

# Confidence intervals (95%) for the EMMs
confint(emm_result_combined)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.28 0.509 29.0     5.24     7.32
 Altitude        4.95 0.408 25.8     4.11     5.79

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate    SE df lower.CL upper.CL
 Sea - Altitude     1.33 0.651 21   -0.023     2.68

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95

print(emm_result_combined)

$emmeans
 altitude_code emmean    SE   df lower.CL upper.CL
 Sea             6.28 0.509 29.0     5.24     7.32
 Altitude        4.95 0.408 25.8     4.11     5.79

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

$contrasts
 contrast       estimate    SE df t.ratio p.value
 Sea - Altitude     1.33 0.651 21   2.045  0.0537

Degrees-of-freedom method: kenward-roger

When comparing sea level to altitude, with altitude including all matches played at or above 1000m, the changes in soreness are not statistically significant (P=0.0537). However, as the p-value is very close to the significance threshold of 0.05, it suggests there is a possible effect that could become statistically significant with a larger sample size or more data points. This implies that the current sample may not have enough power to detect the effect conclusively. Further analysis could provide more clarity and confirm if this trend reflects a true difference in soreness between the two conditions.

The results demonstrate that soreness will, on average, increase by 1.33 AU at altitude. The confidence intervals for this chanage are 0.023 and 2.68 so whilst the lower confidence interval is negligible, the upper confidence interval is substantial. This should be considered when programming and incorporating recovery strategies as an increase of 1 on the Likert Scale is the smallest worwhile change.

There is limited research exploring the effect of altitude on soreness levels. However, the findings from this analysis align with those of Rojas-Valverde et al (2019), who observed increased levels of delayed-onset muscle soreness (DOMS) at higher altitudes (Rojas-Valverde et al. 2019). It is important to note, however, that Rojas-Valverde et al (2019) also included heat as a variable in their study, which could have influenced their results. Therefore, while our findings suggest a similar trend, further research isolating altitude as the primary factor is needed to confirm the relationship between altitude and soreness levels.

While our findings show an increase in soreness at altitude, research has reported that higher altitude hypoxia does not significantly affect muscle function (Edwards et al. 2010), which implies that other factors might be contributing to the observed soreness. Given this, it is possible that environmental stressors, such as temperature, humidity, or the body’s acclimatisation to altitude, rather than altitude alone, are influencing soreness levels. Again, more controlled studies isolating altitude as the primary factor are needed to confirm its direct impact on soreness. Furthermore, monitoring soreness over the next few days following a match would be useful to analyse how long it takes for soreness levels to normalise between individuals.

A graph was generated to plot mean match-day soreness and HSR (m/min) with altitude.

ggplot(soreness_1, aes(x = mean_soreness_1, y = mean_hs_rmin, color = altitude_code)) + 
  geom_point(size = 3, alpha = 0.8) +  # Scatter points
  geom_smooth(method = "lm", se = FALSE) +  # Add trend lines per altitude
  scale_color_manual(
    values = c(Sea = "#082143", Low = "#283FCB", Medium = "#E6D23E")
  ) +
  theme_minimal() +
  labs(
    x = "Mean Soreness Match Day +1",
    y = "Mean High-Speed Running (m/min)",
    color = "Altitude Level",
    title = "Relationship Between Soreness and High-Speed Running at Different Altitudes"
  )

`geom_smooth()` using formula = 'y ~ x'

The trend lines show that for moderate altitude and sea level, the higher HSR (m/min) output, the higher post-match soreness however, at low altitude the opposite was observed, with the greater HSR (m/min) relating to lower soreness reporting.

From the code and findings above, I then generated a interactive dashboard to display to the coaches to provide the key messages.

#Shiny CODE

In relation to the lollipop graphs displayed in this dashboard, the grey shaded region spans from -1 to 1, representing the zone of trivial change - a range within which differences are not considered practically meaningful. This is based on the concept of the smallest worthwhile change (SWC), which, for performance, is often set at 0.2 times the between-subject standard deviation (Fang and Ho 2020). By using a fixed range such as -1 to 1, we provide a visual threshold for interpreting whether observed differences exceed what could be attributed to natural variability or measurement noise.

Research has shown that match-to-match variability in high-speed running is approximately 16% (Gregson et al. 2010). Therefore, to calculate the SWC in high-speed running, 16% of the mean HSR was used, resulting in a value of 0.93. As a result, a threshold of ±1 was applied.

In relation to soreness, the SWC was calculated from taking the mean match-day soreness over the 3 altitudes and multiplying it by 0.2 ((6.28 + 6.27 + 3.12) /3 = 5.22 x 0.2 = 1.04. Therefore, a threshold of ±1 was utilised for soreness.

From the lollipop graphs it is important to note no player increased in HSR (m/min) or reported less soreness outside of the smallest worhtwhile change (+1) whereas the magnitude of HSR(m/min) decrease and soreness increase was much larger. The data suggests that whiolst there wasn’t a meaningful improvement in performance or recovery (soreness reduction), some players experienced a notable decline in performance (HSR) and an increase in soreness at higher altitudes. This could indicate tat altitude negatively affects some players’ physical performance and recovery, but no positive changes (increase in HSR or decrease in soreness) were observed beyond the minimal threshold.

Practical Application:

Impact of Match-Day Performance

High-Speed Running (m/min) shows no significant difference across altitude categories, suggesting that performance during match play is not heavily impacted by altitude.
Recovery and Soreness

Soreness ratings were higher at moderate altitudes (>2000m), indicating increased muscle fatigue and soreness. This could be attributed to lower atmospheric pressure and reduced oxygen availability at higher altitudes.
Altitude and Recovery Strategies

Although altitude may not cause a drastic decline in performance, it’s influence on post-match soreness and overall readiness highlights the need for enhanced recovery protocols - such as extended recovery periods and adjusted training loads - for athletes competing at higher altitudes. Effectively managing post-match soreness, through constant data collection, is particularly important in these environments to maintain performance and reduce injury risk.
Further Research

These findings underscore the need for further research into the long-term effects of training and playing at higher altitudes, as well as the development of effective recovery startegies to optimise athlete performance and well-being in these challenging environments.

References

Aughey, Robert J, Kristal Hammond, Matthew C Varley, Walter F Schmidt, Pitre C Bourdon, Martin Buchheit, Ben Simpson, et al. 2013. “Soccer Activity Profile of Altitude Versus Sea-Level Natives During Acclimatisation to 3600m (ISA3600).” British Journal of Sports Medicine 47 (Suppl 1): i107–13. https://doi.org/10.1136/bjsports-2013-092776.

Chapman, Robert F., James Stray-Gundersen, and Benjamin D. Levine. 1998. “Individual Variation in Response to Altitude Training.” Journal of Applied Physiology 85 (4): 1448–56. https://doi.org/10.1152/jappl.1998.85.4.1448.

Draper, Garrison, Matthew D. Wright, Ai Ishida, Paul Chesterton, Matthew Portas, and Greg Atkinson. 2022. “Do Environmental Temperatures and Altitudes Affect Physical Outputs of Elite Football Athletes in Match Conditions? A Systematic Review of the ‘Real World’ Studies.” Science and Medicine in Football 7 (1): 81–92. https://doi.org/10.1080/24733938.2022.2033823.

Edwards, Lindsay M., Andrew J. Murray, Damian J. Tyler, Graham J. Kemp, Cameron J. Holloway, Peter A. Robbins, Stefan Neubauer, et al. 2010. “The Effect of High-Altitude on Human Skeletal Muscle Energetics: 31P-MRS Results from the Caudwell Xtreme Everest Expedition.” Edited by Conrad P. Earnest. PLoS ONE 5 (5): e10681. https://doi.org/10.1371/journal.pone.0010681.

Fang, Hua, and Indy Man Kit Ho. 2020. “Intraday Reliability, Sensitivity, and Minimum Detectable Change of National Physical Fitness Measurement for Preschool Children in China.” Edited by Subas Neupane. PLOS ONE 15 (11): e0242369. https://doi.org/10.1371/journal.pone.0242369.

Faude, Oliver, Thorsten Koch, and Tim Meyer. 2012. “Straight Sprinting Is the Most Frequent Action in Goal Situations in Professional Football.” Journal of Sports Sciences 30 (7): 625–31. https://doi.org/10.1080/02640414.2012.665940.

GORE, CHRISTOPHER JOHN, SALLY A. CLARK, and PHILO U. SAUNDERS. 2007. “Nonhematological Mechanisms of Improved Sea-Level Performance After Hypoxic Exposure.” Medicine & Science in Sports & Exercise 39 (9): 1600–1609. https://doi.org/10.1249/mss.0b013e3180de49d3.

Gregson, W., B. Drust, G. Atkinson, and V. Salvo. 2010. “Match-to-Match Variability of High-Speed Activities in Premier League Soccer.” International Journal of Sports Medicine 31 (04): 237–42. https://doi.org/10.1055/s-0030-1247546.

Gualtieri, Antonio, Ermanno Rampinini, Antonio Dello Iacono, and Marco Beato. 2023. “High-Speed Running and Sprinting in Professional Adult Soccer: Current Thresholds Definition, Match Demands and Training Strategies. A Systematic Review.” Frontiers in Sports and Active Living 5 (February). https://doi.org/10.3389/fspor.2023.1116293.

Noakes, Timothy D. 2007. “The Central Governor Model of Exercise Regulation Applied to the Marathon.” Sports Medicine 37 (4): 374–77. https://doi.org/10.2165/00007256-200737040-00026.

Rojas-Valverde, Daniel, Jose Alexis Ugalde Ramírez, Braulio Sánchez-Ureña, and Randall Gutiérrez-Vargas. 2019. “Influence of Altitude and Environmental Temperature on Muscle Functional and Mechanical Activation After 30’ Time Trial Run.” MHSalud: Revista En Ciencias Del Movimiento Humano y Salud 17 (1): 1–15. https://doi.org/10.15359/mhs.17-1.2.

Sparks, Martinique, Ben Coetzee, and J. Tim Gabbett. 2016. “Variations in High-Intensity Running and Fatigue During Semi-Professional Soccer Matches.” International Journal of Performance Analysis in Sport 16 (1): 122–32. https://doi.org/10.1080/24748668.2016.11868875.