Calculating the Stability in AFL

The stability of a metric is important in sport for making knowledgeable decisions about future skill. Furthermore, having a stable metric can increase confidence in the metric of predicting future performance.

For this analysis we will be calculating the stability for the variables clearances and marks.

The equation for stability is 1 - (mean[(Xspm - Xpm)^2 - BV[Xspm]]/mean[(Xspm - Xm)^2 - BV[Xspm]])

Step 1: Load Libraries and Load Data

Load the library tidyverse() and load the data set with the player metrics that you want to assess by using read.csv().

#Load Library
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#Load Data
afl_data <- read.csv("AT_data_afl.csv")

Step 2: Clean Data

Using mutate() calculate the total minutes of play for each player each game.

afl_data <- afl_data %>% 
  mutate(Mins = (time_on_ground_percentage/100)*80)

Again using mutate() and group_by() to calculate the players total minutes and how many seasons each player has played. Aswell as changing season and player_id to a character useing as.character().

#Cleaning the data
afl_data <- afl_data %>% 
  group_by(player_id) %>% 
  mutate(totalmin = sum(Mins)) %>% 
  mutate(season = as.character(season)) %>% 
  mutate(nseason = n_distinct(season)) %>% 
  mutate(player_id = as.character(player_id))

Step 3: Summarise the data

This step involves summarising the data to only include the variables needed in the analysis, which includes clearances and marks.

#Create a new data Frame
afl_data1 <- afl_data %>% 
  group_by(player_id,season) %>% 
  summarise(player_id = player_id[1],
            season = season,
            player_team = player_team[1],
            marks = mean(marks),
            clearances = mean(clearances),
            totalmin = totalmin,
            nseason = nseason[1])

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id', 'season'. You can override
## using the `.groups` argument.

Step 4: Calculate the Average of Xpm value

Xpm is the calculation of each metric for all player over all seasons.

#Calculating Xpm
avg.Xpm <- afl_data1 %>% 
  group_by(player_id) %>% 
  summarise(Xpm.player_id = player_id[1],
            Xpm.season = season,
            Xpm.player_team = player_team[1],
            Xpm.marks = mean(marks),
            Xpm.clearances = mean(clearances))

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

Step 5: Calculate the Xspm value

Xspm is the average value of every player across every season

#Calculating Xspm
Xspm <- afl_data1 %>% 
  group_by(player_id,season) %>% 
  summarise(Xspm.player_id = player_id[1],
            Xspm.season = season,
            Xspm.player_team = player_team[1],
            Xspm.marks = mean(marks),
            Xspm.clearances = mean(clearances))

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id', 'season'. You can override
## using the `.groups` argument.

Step 6: Calculate the square root of Xspm - Xpm

To calculate the square root of Xspm - Xpm we first need to use the left_join() finction to combine the two data frames above. Then we use the mutate() function to calculate this equation (Xspm - Xpm)^2 for our variables

sqXspm_Xpm <- left_join(Xspm, avg.Xpm, by = "player_id") %>% 
  mutate(sq.marks = (Xspm.marks - Xpm.marks)^2,
         sq.clearances = (Xspm.clearances - Xpm.clearances)^2)

## Warning in left_join(Xspm, avg.Xpm, by = "player_id"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 1 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

Step 7: Calucating the Bootstrapped Variance for Xspm BV[Xspm]

The bootstrap method uses random sampling with replacement on a data set and therefore creating a new version of the dataset. This method increase accuracy of the model and estimates the size of confidence intervals.

#bootstrapping the metrics
#Create a dataframe with every player's stats
aflplayerseason <- afl_data[c('season','player_id')] %>%
  unique()

nboots = 10 # number of bootstraps (replays) we are going to do
replyafl = list() # empty list to store results
for (i in 1:nboots){
  show(i)
  
  # randomly re-sample at the team-game level
  resamp_aflplayerseason <- aflplayerseason %>%
    group_by(player_id, season) %>%
    sample_frac(size = 1, replace = T)
  
   # join the player stats
  new_alfplayerseason <- left_join(resamp_aflplayerseason,
                                   afl_data,
                                   by = c("season", "player_id"))
  
  #summary stats for this reply
  newseason_statsafl <- new_alfplayerseason %>%
    group_by(player_id) %>% 
    summarise(player_id = player_id[1],
              season = season,
              player_team = player_team[1],
              marks = mean(marks),
             clearances = mean(clearances),
              totalmin = totalmin,
              nseason = nseason[1])
  
  # store results in the list
  replyafl[[i]] <- newseason_statsafl
  
}

## [1] 1

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 2

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 3

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 4

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 5

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 6

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 7

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 8

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 9

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

## [1] 10

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## `summarise()` has grouped output by 'player_id'. You can override using the
## `.groups` argument.

# bind all the re-samples into 1 dataframe
resamp_resultsafl <- bind_rows(replyafl)

Now we are going to work out the bootstrapped variances for each metric

#bootstrapped variances
bootstrap_varsafl <- resamp_resultsafl %>%
  group_by(player_id) %>% 
  summarise(player_id = player_id[1],
            season = season[1],
            player_team = player_team[1],
            var.marks = var(marks),
            var.clearances = var(clearances))

Step 8: Calculating the numerator

To calculate the numerator we use this equation mean[(Xspm - Xpm)^2 - BV[Xspm]], which we have every calculated every metric we need.

#Calculating the numerator for Marks
smn.marks <-  (sqXspm_Xpm$sq.marks - bootstrap_varsafl$var.marks)

## Warning in sqXspm_Xpm$sq.marks - bootstrap_varsafl$var.marks: longer object
## length is not a multiple of shorter object length

#Need to average the numerator
smn.marks <- mean(smn.marks)

#Calculating the numerator for Clearances
smn.clearances <-  (sqXspm_Xpm$sq.clearances - bootstrap_varsafl$var.clearances)

## Warning in sqXspm_Xpm$sq.clearances - bootstrap_varsafl$var.clearances: longer
## object length is not a multiple of shorter object length

#Need to average the numerator
smn.clearances <- mean(smn.clearances)

Step 9: Calculating the Dominator

Part A:

For Part A we need to calculate the metric X.m which is the average over all players and seasons

#Calculating X.m 
X.m <- afl_data %>% 
  summarise(marks = mean(marks),
            clearances = mean(clearances))

Part B:

Now we have all the metrics we need to work out the Denominator equation which is mean[(Xspm - Xm)^2 - BV[Xspm]]

#Calculating the denominator for Marks
smd.marks <-(Xspm$Xspm.marks - X.m$marks)^2 - bootstrap_varsafl$var.marks

## Warning in Xspm$Xspm.marks - X.m$marks: longer object length is not a multiple
## of shorter object length

## Warning in (Xspm$Xspm.marks - X.m$marks)^2 - bootstrap_varsafl$var.marks:
## longer object length is not a multiple of shorter object length

#Need to average the denominator
smd.marks <- mean(smd.marks)

#Calculating the denominator for Clearances
smd.clearances <- (Xspm$Xspm.clearances - X.m$clearances)^2 - bootstrap_varsafl$var.clearances

## Warning in Xspm$Xspm.clearances - X.m$clearances: longer object length is not a
## multiple of shorter object length

## Warning in (Xspm$Xspm.clearances - X.m$clearances)^2 -
## bootstrap_varsafl$var.clearances: longer object length is not a multiple of
## shorter object length

#Need to average the denominator
smd.clearances <- mean(smd.clearances)

Step 10: Calculating Stability

Finally after working out the denominator and the numerator for the equation we can workout the stability of the metrics. We use this equation: 1 - ([(Xspm - Xpm)^2 - BV[Xspm]]/[(Xspm - Xm)^2 - BV[Xspm]]) For this we do not need to add the mean of the denominator and the numerator because we have already did this in a step above

#Stability of Marks
sm.mark <- 1-(smn.marks/smd.marks)
#To see the Mark stability value
sm.mark

## [1] 0.8464899

#Stability of Clearances
sm.clearances <- 1-(smn.clearances/smd.clearances)
#To see the Mark stability value
sm.clearances

## [1] 0.88976