The Challenge
The Canadian National Women’s Rugby Team seeks your advice on the role of workload and fatigue in Rugby 7s. Rugby 7s is a fast-paced, physically demanding sport that pushes the limits of athlete speed, endurance and toughness. Rugby 7s players may play in up to three games in a day, resulting in a tremendous amount of athletic exertion. Substantial exertion results in fatigue, which may lead to physiological deficits (e.g., dehydration), reduced athletic performance, and greater risk of injury. Despite the importance of managing player fatigue in professional athletics, very little is known about its effects, and many training decisions are based on “gut feel.” Currently, training load is measured through a combination of subjective measurements (asking players how hard they worked) and objective measurements from wearable technology. Fatigue is typically estimated by asking players how they feel in wellness surveys. However, there is no agreed upon standard of defining fatigue so the relationship between workload and fatigue is unclear. In this challenge, we encourage you to explore new ways of measuring fatigue and examine its effects on players’ performance and physical wellness. The datasets provide a number of observations that we believe will be useful to measure fatigue in players of the Canadian National Women’s Rugby Team in the 2017-2018 season. Remember that training load is not the same as fatigue, and one question to explore is whether you can find evidence that some measures of training load are better than others.
tl;dr
- Canadian women’s rugby 7’s players play up to three games in a day leading to negative physiological effects.
- Little is known about the effects of player fatigue so training decisions are made on intuition
- Training load is currently measured through:
- Subjective measurements in the form of surveys about how hard players worked
- “Objective” measures from wearable devices
- Relationship between workload & fatigue is unclear because there is no agreed upon standard of defining fatigue.
- The challenges:
- Explore new ways of measuring fatigue
- Examine the effects of fatigue on players’ performance
- Examine the effects of fatigue on players’ physical wellness
Some issues to consider:
- How reliable are subjective wellness data? Can you quantify the individual variation in self-reported data and use this to adjust measures of wellness?
- Should the quality of the opponent or the outcome of the game be considered when examining fatigue during a game?
- Some accepted (and even widely used) measurements of training load or fatigue are naive. For example, you’ll find in these data a “Monitoring Score” which simply sums the values of other subjective scores in an attempt to create a single overall measure of fatigue. Is a simple sum useful? Or can it be improved? For example, are all components of this Monitoring Score needed? Are some more important than others, and why?
- Be wary of missing variables. Most often they indicate that a player simply did not provide information or that sensors were not functioning. But in some situations values are missing because they are not meaningful in a certain context. You’ll find that a one-size-fits-all approach is not useful.
- You will find it tempting to use the location data to help inform on-field strategy. We advise against this because it is unlikely to help you understand fatigue. The location data are provided in order to help you study fatigue. For example, it could be used to, verify hypotheses, or evaluate player fatigue in different positions (e.g., how does a player’s position contribute to their fatigue?).
General Advice
The challenge is deliberately large and vague. You should feel welcomed to identify a small problem within this much larger problem and even to examine only a subset of the data (e.g., a single game or a single tournament). Where to Begin: Data Codebook
Data Overview
Common Characteristics
The data were collected during the 2017-2018 season. There are five files that give different aspects of the games. The data themselves were collected through a variety of means.
- Player level data are provided by the individual athletes themselves and by IMU/GPS devices worn on their vests during games. GPS data may not be available if players are out of range of the satellites. Players are uniquely identified by the PLayerID variable in all data files.
- Data are available on each game played during the season. Games are often organized in tournaments, which consist of up to 6 games. Each game consists of two 7-minute halves (except the final game of a tournament (game 6), which consists of two 10-minute halves). Games can have extra time at the referee’s discretion, if play is stopped for some reason during the game. There can be up to three games played on a single day. The order and time of the games is provided.
There were a total of 43 games, and they are identified by the GameID, which indicates the order in which the games were played throughout the season. (GameID=1 is the first game played in the season.)
There are four types of files, described below:
Games.CSV
Tells you when, where, opponent, and high-level outcomes and events in the game (“box scores”).
How were data collected: information comes from here
How to use: high-level game information. Links to: GameID links to gps. Date links to wellness, GPS, Rate of Perceived Effort (RPE)
| Variable | Description |
|---|---|
| GameID | Unique identifier for the game |
| Date | Date on which the game was played |
| Tournament | Tournament that hosts game |
| TournamentGame | The game number of the tournament (1 = 1st game of tournament) |
| Team | Canada |
| Opponent | The country that Canada played against |
| Outcome | W if Canada won the game, L if Canada lost |
| TeamPoints | Number of points that Canada scored |
| TeamPointsAllowed | Number of points that the opposing team scored |
Wellness.csv
Self-reported health and wellness for each player.
How Collected: self-reported by each athlete. In principle, reported every morning before 8:30am. All values are subjective, but Urine Specific Gravity (USG) is recorded through a sensor.Each athlete may have a different sense of what “typical” means for them, so consider standardizing per athlete.
How to Use: provides subjective sense of energy levels. USG can provide evidence of dehydration.
Links: Date links to games, wellness, RPE, GPS. PlayerID links to RPE, GPS.
| Variable | Description |
|---|---|
| Date | Date on which the game was played |
| PlayerID | Unique identifier for the player |
| Fatigue | Degree of fatigue (1-7 scale) (See scale in columns E:G) |
| Soreness | Degree of soreness (1-7 scale) |
| Desire | Degree of motivation (1-7 scale) |
| Irritability | Degree of irritability (1-7 scale) |
| BedTime | Time player went to bed |
| WakeTime | Time player woke up |
| SleepHours | The number of hours of sleep the player got |
| SleepQuality | Quality of sleep (1-7) scale |
| MonitoringScore | The sum of the five scale variables |
| Pain | Is the player in pain? |
| Illness | Is the player feeling ill? |
| Menstruation | Is the player currently menstruating? |
| Nutrition | How is the player’s nutrition? |
| NutritionAdjustment | Has the player made a nutrition adjustment that day? |
| USGMeasurement | Was hydration tested? |
| USG | Urine specific gravity (above 1.025 indicates mild dehydration) |
| TrainingReadiness | How ready is the player to train? |
| Variable | Label | Description |
|---|---|---|
| Fatigue | 1 | Exhausted |
| Fatigue | 2 | NA |
| Fatigue | 3 | NA |
| Fatigue | 4 | Average |
| Fatigue | 5 | NA |
| Fatigue | 6 | NA |
| Fatigue | 7 | Fresher than Usual |
| Soreness | 1 | Sorer than Usual |
| Soreness | 2 | NA |
| Soreness | 3 | NA |
| Soreness | 4 | Average |
| Soreness | 5 | NA |
| Soreness | 6 | NA |
| Soreness | 7 | Better than Usual |
| Desire | 1 | Lower than Usual |
| Desire | 2 | NA |
| Desire | 3 | NA |
| Desire | 4 | Average |
| Desire | 5 | NA |
| Desire | 6 | NA |
| Desire | 7 | Higher than Usual |
| Irritability | 1 | Worse Mood than Usual |
| Irritability | 2 | NA |
| Irritability | 3 | NA |
| Irritability | 4 | Average |
| Irritability | 5 | NA |
| Irritability | 6 | NA |
| Irritability | 7 | Better Mood than Usual |
| SleepQuality | 1 | Restless |
| SleepQuality | 2 | NA |
| SleepQuality | 3 | NA |
| SleepQuality | 4 | Average |
| SleepQuality | 5 | NA |
| SleepQuality | 6 | NA |
| SleepQuality | 7 | Deep and Restful |
RPE.csv
Rate of Perceived Effort. Self-reported workloads for each “session”. A session can be a workout (focusing on a particular objective) or a game.
How Collected: In theory, each player rates herself after each session and/or game. It is easy, however, for players to neglect this when playing back-to-back games. Note that each day there can be multiple “sessions”, and that a “session” can be a recovery period, a game, strength & conditioning, etc. There is no way to associate a particular rating with a particular game on days in which multiple games were played.
How to Use: Can be used to provide a subjective sense of fatigue. Note that what one player rates “4” for RPE another might rate “7” or any other number, so consider standardizing per player.For many sports analysts, a ratio of acute/chronic training load > 1.2 indicates that the athlete is currently in “high” training load and at an increased risk for injury. A ratio < 0.8 indicates that they are “de-training” or recovering.These are cut-off values based on Australian Football League players.
Links: Date links to wellness, games, GPS. PlayerID links to wellness and GPS
| Variable | Description |
|---|---|
| Date | Date |
| PlayerID | Unique identifier for the player |
| Training | Did the playe train that day? |
| SessionType | Type of session |
| Duration | Duration of session (minutes) |
| RPE | Rate of perceived exertion (0-10 scale) |
| SessionLoad | Session load (in arbitrary units) = Duration * RPE |
| DailyLoad | Daily load (in arbitrary units) = Sum of SessionLoad for a given day, reported only once for a given day |
| AcuteLoad | Average daily load over past 7 days |
| ChronicLoad | Average daily load over past 30 days |
| AcuteChronicRatio | AcuteLoad/ChronicLoad. Values greater than 1.2 indicate training, values greater than 0.8 indicate de-training |
| ObjectiveRating | Did player accomplish objective (1-10 scale) |
| FocusRating | Degree of focus during session (1-10 scale) |
| BestOutOfMyself | Did the player get the best out of herself today? |
| Variable | Label | Description |
|---|---|---|
| SessionType | Combat | Wrestling/Grappling practice |
| SessionType | Conditioning | Endurance training |
| SessionType | Game | Rugby 7s game |
| SessionType | Mobility/Recovery | Stretching, expanding range of motion |
| SessionType | Skills | Skills training |
| SessionType | Speed | Speed training |
| SessionType | Strength | Weightlifting |
| RPE | 0 | Nothing at all |
| RPE | 1 | Very light |
| RPE | 2 | Fairly light |
| RPE | 3 | Moderate |
| RPE | 4 | Somewhat hard |
| RPE | 5 | Hard |
| RPE | 6 | NA |
| RPE | 7 | Very Hard |
| RPE | 8 | NA |
| RPE | 9 | NA |
| RPE | 10 | Maximal effort |
GPS.csv
Position data for each player during a game.
How Collected: Data collected from sensors worn by players. Originally, data were collected at 100 Hz (100 times per second), but have been collapsed to 10 Hz. Thus, each second, there are 10 “frames” that provide information on player location and acceleration.
Note that we do not know the location of the ball, or the orientation of the playing field. The “z” acceleration is in the up-down direction, x is back-front, y is side-to-side.
How to Use: with caution. Note that making plots of location is unlikely to help you understand the role of fatigue unless you first think carefully about aspects of location that might be affected by fatigue. Some large-scale things to consider: can you infer tackles? Coaches usually encourage players to keep space between them.
Links: Date links to games, substitutions, wellness, RPE. PlayerID links to wellness, RPE. GameID links to games.
| Variable | Description |
|---|---|
| GameID | Unique identifier for game |
| Half | The half of the game |
| PlayerID | Unique identifier for the player |
| FrameID | Ordered unit of time within a half, ten frames per second. Example: FrameID = 100 implies we are 10 seconds into the half |
| Time | The current time |
| GameClock | The time on the game clock. The first half goes at least until 7:00 and the clock counts upwards. The game clock at the second half begins at 7:00 and goes until at least 14:00 |
| Speed | Movement speed of the player, in meters per second |
| AccelImpulse | The absolute value of change in speed divided by change in time |
| AccelLoad | Load detected by the accelerometer, in arbitrary units (AU) |
| AccelX | Acceleration in anterioposterior axis direction (meters per second squared) |
| AccelY | Acceleration in medial axis direction (meters per second squared) |
| AccelZ | Acceleration in vertical direction (meters per second squared) |
Consider learning about Rugby 7s
Questions about the data?
We have created a google doc for you! Please ask one of the DataFest assistants to post your question or to check to see if the question has already been answered. Our “data experts” will check frequently.
Data Cleaning
## Classes 'data.table' and 'data.frame': 38 obs. of 9 variables:
## $ GameID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Date : chr "2017-11-30" "2017-11-30" "2017-11-30" "2017-12-01" ...
## $ Tournament : chr "Dubai" "Dubai" "Dubai" "Dubai" ...
## $ TournamentGame : int 1 2 3 4 5 6 1 2 3 4 ...
## $ Team : chr "Canada" "Canada" "Canada" "Canada" ...
## $ Opponent : chr "Spain" "Ireland" "Fiji" "France" ...
## $ Outcome : chr "W" "W" "W" "W" ...
## $ TeamPoints : int 19 31 31 24 7 5 24 24 19 28 ...
## $ TeamPointsAllowed: int 0 0 14 19 25 10 12 12 5 12 ...
## - attr(*, ".internal.selfref")=<externalptr>
## GameID Date Tournament TournamentGame
## "integer" "character" "character" "integer"
## Team Opponent Outcome TeamPoints
## "character" "character" "character" "integer"
## TeamPointsAllowed
## "integer"
df.games <- df.games %>%
mutate(game = as.factor(GameID),
date = as.Date(Date),
tour = as.factor(Tournament),
t.game = as.factor(TournamentGame),
team = as.factor(Team),
opp = as.factor(Opponent),
win = (as.numeric(factor(df.games$Outcome, levels = c("L", "W"), ordered = TRUE)) - 1),
pts.scored = as.integer(TeamPoints),
pts.allowed = as.integer(TeamPointsAllowed)) %>%
mutate(pts.dif = pts.scored - pts.allowed,
pts.tot = pts.scored + pts.allowed) %>%
select(game, date, tour, t.game, team, opp, win, pts.scored, pts.allowed, pts.tot, pts.dif)
# GPS Data
str(df.gps)## Classes 'data.table' and 'data.frame': 4570160 obs. of 14 variables:
## $ GameID : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Half : int 1 1 1 1 1 1 1 1 1 1 ...
## $ PlayerID : int 2 2 2 2 2 2 2 2 2 2 ...
## $ FrameID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Time : chr "00:22:01" "00:22:01" "00:22:01" "00:22:01" ...
## $ GameClock : chr "00:00:00" "00:00:00" "00:00:00" "00:00:00" ...
## $ Speed : num 0.658 0.594 0.364 0.444 0.4 ...
## $ AccelImpulse: num 0.611 0.639 2.306 0.806 0.444 ...
## $ AccelLoad : num 0.00533 0.00657 0.00311 0.0026 0.00381 ...
## $ AccelX : num 0.1325 0.11125 0.01375 0.00625 -0.0175 ...
## $ AccelY : num 0.699 0.92 0.77 0.886 0.858 ...
## $ AccelZ : num 0.565 0.706 0.677 0.595 0.574 ...
## $ Longitude : num 55.5 55.5 55.5 55.5 55.5 ...
## $ Latitude : num 25 25 25 25 25 ...
## - attr(*, ".internal.selfref")=<externalptr>
## GameID Half PlayerID FrameID Time
## "integer" "integer" "integer" "integer" "character"
## GameClock Speed AccelImpulse AccelLoad AccelX
## "character" "numeric" "numeric" "numeric" "numeric"
## AccelY AccelZ Longitude Latitude
## "numeric" "numeric" "numeric" "numeric"
df.gps <- df.gps %>%
mutate(game = as.factor(GameID),
half = as.factor(Half),
player = as.factor(PlayerID),
frame = as.factor(FrameID),
time = chron(times = Time),
game.time = chron(times = GameClock)) %>%
select(game, half, player, frame, time, game.time,
Speed, AccelImpulse, AccelLoad, AccelX, AccelY, AccelZ, Longitude, Latitude)
# Wellness Data
str(df.wellness)## Classes 'data.table' and 'data.frame': 5011 obs. of 19 variables:
## $ Date : chr "2018-07-21" "2018-07-21" "2018-07-21" "2018-07-21" ...
## $ PlayerID : int 1 2 3 4 5 7 10 11 13 14 ...
## $ Fatigue : int 3 4 3 2 5 2 2 4 4 3 ...
## $ Soreness : int 3 3 3 3 3 2 2 3 3 3 ...
## $ Desire : int 2 4 5 5 4 5 4 6 5 4 ...
## $ Irritability : int 3 4 4 4 4 4 4 4 4 4 ...
## $ BedTime : chr "23:00:00" "23:00:00" "22:30:00" "00:30:00" ...
## $ WakeTime : chr "07:00:00" "07:00:00" "06:30:00" "07:00:00" ...
## $ SleepHours : num 8 8 8 6.5 7.25 9 7.25 8 8 8.75 ...
## $ SleepQuality : int 2 4 4 1 4 3 3 3 4 4 ...
## $ MonitoringScore : int 13 19 19 15 20 16 15 20 20 18 ...
## $ Pain : chr "No" "Yes" "No" "No" ...
## $ Illness : chr "No" "No" "No" "No" ...
## $ Menstruation : chr "Yes" "Yes" "No" "Yes" ...
## $ Nutrition : chr "Excellent" NA NA "Excellent" ...
## $ NutritionAdjustment: chr "Yes" NA NA "Yes" ...
## $ USGMeasurement : chr "No" "Yes" "Yes" "Yes" ...
## $ USG : num NA 1.01 1.02 1.02 1.02 ...
## $ TrainingReadiness : chr "0%" "0%" "100%" "95%" ...
## - attr(*, ".internal.selfref")=<externalptr>
## [1] "No" "Yes"
## Date PlayerID Fatigue
## "character" "integer" "integer"
## Soreness Desire Irritability
## "integer" "integer" "integer"
## BedTime WakeTime SleepHours
## "character" "character" "numeric"
## SleepQuality MonitoringScore Pain
## "integer" "integer" "character"
## Illness Menstruation Nutrition
## "character" "character" "character"
## NutritionAdjustment USGMeasurement USG
## "character" "character" "numeric"
## TrainingReadiness
## "character"
df.test <- df.wellness %>%
mutate(date = as.Date(Date),
player = as.factor(PlayerID),
bedtime = chron(times = BedTime),
waketime = chron(times = WakeTime),
pain = (as.numeric(factor(df.wellness$Pain,
levels = c("No", "Yes"),
ordered = TRUE)) - 1),
ill = (as.numeric(factor(df.wellness$Illness,
levels = c("No", "Slightly Off","Yes"),
ordered = TRUE)) - 1),
mens = (as.numeric(factor(df.wellness$Menstruation,
levels = c("No", "Yes"),
ordered = TRUE)) - 1),
nutrition = (as.numeric(factor(df.wellness$Illness,
levels = c("Poor", "Okay","Excellent"),
ordered = TRUE)) - 1),
nutrition.adj = (as.numeric(factor(df.wellness$Illness,
levels = c("No", "I Don't Know", "Yes"),
ordered = TRUE)) - 2),
USGmeasurement = (as.numeric(factor(df.wellness$USGMeasurement,
levels = c("No", "Yes"),
ordered = TRUE)) - 1),
trainingreadiness = parse_number(as.character(df.wellness$TrainingReadiness))) %>%
select(date, player, Fatigue, Soreness, Desire, Irritability, bedtime, waketime, SleepHours, SleepQuality, MonitoringScore, pain, ill, mens, nutrition, nutrition.adj, USGmeasurement, USG, trainingreadiness)
# RPE DataRugby Fatigue Research
From Edwin Cook (Pat’s rugby friend)
- Fatigue in rugby is similar to really any other sport. Conditioning is huge, you need to be fit in order to play 80-90 minutes at top performance
- Injuries are similar to football and hockey. Most common are ankle, knee and shoulder injuries I’d say
Initial Analysis
df <- df.games
attach(df)
# Maximum Number of Games in each tournament
df.tour.sum <- df %>%
group_by(tour) %>%
summarise(games = max(as.numeric(t.game)),
wins = sum(as.numeric(win)),
points.scored = sum(pts.scored),
points.allowed = sum(pts.allowed),
total.points = sum(pts.tot),
differential = sum(pts.dif))
kable(df.tour.sum, caption = "Summary of Games by Tournament")| tour | games | wins | points.scored | points.allowed | total.points | differential |
|---|---|---|---|---|---|---|
| Commonwealth | 5 | 2 | 86 | 93 | 179 | -7 |
| Dubai | 6 | 4 | 117 | 68 | 185 | 49 |
| Kitakyushu | 5 | 2 | 114 | 92 | 206 | 22 |
| Langford | 6 | 4 | 140 | 101 | 241 | 39 |
| Paris | 6 | 4 | 116 | 116 | 232 | 0 |
| Sydney | 6 | 5 | 135 | 79 | 214 | 56 |
| World Cup | 4 | 2 | 98 | 79 | 177 | 19 |
df.paris <- df.games %>% filter(tour == "Paris")
kable(df.paris, caption = "Paris Tournament Games")| game | date | tour | t.game | team | opp | win | pts.scored | pts.allowed | pts.tot | pts.dif |
|---|---|---|---|---|---|---|---|---|---|---|
| 29 | 2018-06-08 | Paris | 1 | Canada | Russia | 1 | 31 | 5 | 36 | 26 |
| 30 | 2018-06-08 | Paris | 2 | Canada | Fiji | 1 | 21 | 12 | 33 | 9 |
| 31 | 2018-06-08 | Paris | 3 | Canada | Australia | 0 | 14 | 31 | 45 | -17 |
| 32 | 2018-06-09 | Paris | 4 | Canada | USA | 1 | 26 | 24 | 50 | 2 |
| 33 | 2018-06-09 | Paris | 5 | Canada | New Zealand | 0 | 7 | 34 | 41 | -27 |
| 34 | 2018-06-10 | Paris | 6 | Canada | France | 1 | 17 | 10 | 27 | 7 |