Introduction

This R document details the methods used to develop the normative base in the Artificial Rocket League Coach (ARC). The premise of the ARC is that semi-pro Rocket League (https://www.youtube.com/c/rocketleague) players can submit their replay data (recorded games) to the ARC to receive personalized coaching on how to improve their play at the Grand Champion rank and above. For the moment, all replay data is managed by Ballchasing (https://www.ballchasing.com) which is a fan-made API that allows users to upload, parse, query, and download Rocket League replay information.

Sample

The ARC’s output depends on the comparison of the user’s replay data to other players of the same rank and circumstances. For the ARC’s output to be valid coaching advice, we must first establish a representative normative sample. Because Psyonix (the developed of Rocket League) does not reveal player matchmaking rank (MMR), we have to manually find an adequate sample of players at the Grand Champ rank and above. We also need the sample to represent competitive (ranked) play in 3v3 (standard) games. To do this we will look to Rocket Sundays, a fan organization which facilitates many social activities related to Rocket League including regular eSports tournaments. Each tournament takes place over “seasons”. More information on Rocket Sundays’ tournaments can be found at their website (https://www.sundaysesport.co.uk/rocket-sundays). Organizers of the Rocket Sundays tournaments upload every game played into a replay group on the Ballchasing website. We are going to use Rocket Sundays’ replay groups to derive our normative sample for the ARC. Later we may augment the sample with other comparable sources.

There are many benefits to using this community-created replay group as the normative sample. First, it allows us to easily limit our sample to players at the Grand Champion rank. Second, because these are tournament-level games, it allows us to assume that the play quality of the replays is high (in other words, these replays represent each player’s best performance). Third, it allows us to control for the inevitable drift in variable values over time (the player population of Rocket League continues to improve and so mean scores of variables will increase or decrease; see https://blog.calculated.gg/2019/01/if-rlcs-season-1-players-were-transported-to-2019-what-rank-would-they-be/ for more information). Finally, when querying a replay group through the Ballchasing API, the replay data is already summarized (usually averaged) by player which provides stability in each variable’s variance.

So, let’s get started and pull the sample from the https://www.ballchasing.com/ API using thehttr2 package:

    req <- request("https://ballchasing.com/api/groups/rocket-sundays-season-11-elite-t-7d6bfzlsth") %>%
      req_headers("Authorization" = "htAsg2NTlaecJ0nVeAcqdu6og869WH3F8vJm5NSY", "Accept" = "application/json") 

    res <- req_perform(
      req
    )

You can refer to the Ballchasing API documentation to see the format that the data comes in, but it needs a bit of parsing to get it into a format suitable for norming:

    data <- resp_body_string(res)
    data <- data %>% enter_object(players) %>% gather_array %>% spread_all %>% as_tibble

Ballchasing returns quite a bit of data. All of it is useful somehow, but only a portion of it is useful to the ARC. So we are going to pick out variables that offer the potential to be useful to the ARC:

    selectedData <- data %>% select(all_of(
      c(
        "game_average.boost.bpm",
        "game_average.boost.avg_amount",
        "game_average.boost.amount_collected",
        "game_average.boost.amount_stolen",
        "game_average.boost.percent_zero_boost",
        "game_average.boost.percent_full_boost",
        "game_average.boost.amount_overfill",
        "game_average.boost.amount_overfill_stolen",
        "game_average.boost.amount_used_while_supersonic",
        "game_average.boost.percent_boost_0_25",
        "game_average.boost.percent_boost_25_50",
        "game_average.boost.percent_boost_50_75",
        "game_average.boost.percent_boost_75_100",
        "game_average.movement.avg_speed",
        "game_average.movement.total_distance",
        "game_average.movement.count_powerslide",
        "game_average.movement.avg_powerslide_duration",
        "game_average.movement.avg_speed_percentage",
        "game_average.movement.percent_slow_speed",
        "game_average.movement.percent_boost_speed",
        "game_average.movement.percent_supersonic_speed",
        "game_average.movement.percent_ground",
        "game_average.movement.percent_low_air",
        "game_average.movement.percent_high_air",
        "game_average.positioning.avg_distance_to_ball",
        "game_average.positioning.avg_distance_to_ball_possession",
        "game_average.positioning.avg_distance_to_ball_no_possession",
        "game_average.positioning.percent_defensive_third",
        "game_average.positioning.percent_offensive_third",
        "game_average.positioning.percent_neutral_third",
        "game_average.positioning.percent_defensive_half",
        "game_average.positioning.percent_offensive_half",
        "game_average.positioning.percent_behind_ball",
        "game_average.positioning.percent_infront_ball",
        "game_average.demo.inflicted",
        "game_average.demo.taken"
        )
      )
    )

The research team chose these variables for a few reasons.

First, the variables appear to be moderately to strongly correlated with player MMR and they also hold relatively even distributions across the player population of Rocket League. These basic averages and distributions by rank can be observed on the main Ballchasing dashboard.

Another reason we selected these variables is because they describe player behavior regardless of win/loss conditions. In other words, some variables returned by Ballchasing like game_average.core.shots or game_average.core.saves describe the outcome of a game which is independent of the players’ MMR or general skill ability. In fact, earlier unpublished findings by our research team show no correlation between any of the core statistics and MMR. We did find a moderate to strong correlation among relationships between a player’s win percentage and core variables.

Another important consideration: because the replay group contains replays from a tournament, the number of replays per player is variable. In other words, the players who won the tournament played more games than the players that were eliminated in the first round. Because of this We cannot use cumulative data, absolute counts, or other non-average statistics from the replay group because they come from unequally sized sub-samples. The selected variables allow us to explore relationships without negative effects from this imbalance. In turn though it does mean that some player averages resulted from a higher sampling rate than others but that fact seems negligible.

Summary Statistics

Next we will calculate the summary statistics for each variable:

    summaryData <- data.frame(selectedData %>% summarise_all(mean))
    summaryData[2, ]<- selectedData %>% summarise_all(sd)  
    summaryData[3, ]<- selectedData %>% summarise_all(median)  
    summaryData[4, ]<- selectedData %>% summarise_all(min)  
    summaryData[5, ]<- selectedData %>% summarise_all(max)  
    summaryData[6, ]<- selectedData %>% summarise_all(skewness)  
    summaryData[7, ]<- selectedData %>% summarise_all(kurtosis)   
    
    rownames(summaryData) <- c("mean", "sd", "median", "min", "max", "skewness", "kurtosis")
        
    kable(t(summaryData), digits = 2)
mean sd median min max skewness kurtosis
game_average.boost.bpm 397.34 38.54 407.13 312.46 461.87 -0.37 2.33
game_average.boost.avg_amount 48.86 2.53 49.04 43.63 54.43 -0.17 2.76
game_average.boost.amount_collected 2449.86 259.75 2536.08 1678.67 3086.00 -0.52 3.40
game_average.boost.amount_stolen 533.27 96.87 545.31 288.14 760.50 -0.42 2.84
game_average.boost.percent_zero_boost 11.36 2.70 11.49 5.62 18.48 0.21 3.06
game_average.boost.percent_full_boost 10.84 2.93 10.73 5.56 20.54 0.90 4.54
game_average.boost.amount_overfill 385.61 111.50 384.64 82.33 656.16 0.19 3.36
game_average.boost.amount_overfill_stolen 70.04 29.47 65.74 22.12 155.17 0.71 3.54
game_average.boost.amount_used_while_supersonic 301.61 91.38 285.39 101.33 613.83 0.74 4.45
game_average.boost.percent_boost_0_25 31.59 4.07 31.33 21.32 43.15 0.27 3.41
game_average.boost.percent_boost_25_50 22.11 1.96 21.87 17.83 26.59 0.19 2.71
game_average.boost.percent_boost_50_75 17.31 1.87 17.19 12.28 23.22 0.28 4.24
game_average.boost.percent_boost_75_100 24.56 2.88 24.84 17.25 32.11 -0.01 3.47
game_average.movement.avg_speed 1512.74 50.99 1519.94 1357.95 1626.19 -0.74 4.15
game_average.movement.total_distance 521383.50 23769.80 523384.68 449191.00 594001.20 -0.20 4.56
game_average.movement.count_powerslide 65.23 24.64 61.61 18.67 114.00 0.24 2.35
game_average.movement.avg_powerslide_duration 0.12 0.03 0.11 0.08 0.22 1.15 4.95
game_average.movement.avg_speed_percentage 65.77 2.22 66.08 59.04 70.70 -0.74 4.15
game_average.movement.percent_slow_speed 47.65 3.38 47.54 39.64 56.88 0.48 3.84
game_average.movement.percent_boost_speed 38.83 1.77 38.47 35.79 42.94 0.22 2.27
game_average.movement.percent_supersonic_speed 13.52 2.82 13.93 6.38 21.20 0.02 3.75
game_average.movement.percent_ground 57.68 2.80 57.89 49.41 66.40 -0.05 4.60
game_average.movement.percent_low_air 38.20 2.50 38.00 31.45 43.34 -0.14 2.86
game_average.movement.percent_high_air 4.12 1.01 4.12 2.14 8.06 0.83 6.04
game_average.positioning.avg_distance_to_ball 2478.05 135.84 2480.88 2111.78 2769.74 -0.19 2.83
game_average.positioning.avg_distance_to_ball_possession 2325.12 156.74 2348.43 1882.76 2615.05 -0.41 2.96
game_average.positioning.avg_distance_to_ball_no_possession 2615.79 141.40 2619.56 2336.01 2968.47 0.04 2.69
game_average.positioning.percent_defensive_third 47.25 3.98 46.74 38.61 55.97 0.22 2.56
game_average.positioning.percent_offensive_third 20.83 2.43 20.91 15.81 26.84 -0.01 2.67
game_average.positioning.percent_neutral_third 31.92 2.18 32.14 27.74 38.89 0.32 3.77
game_average.positioning.percent_defensive_half 64.05 3.44 63.54 56.83 71.70 0.16 2.55
game_average.positioning.percent_offensive_half 35.95 3.44 36.46 28.30 43.17 -0.16 2.55
game_average.positioning.percent_behind_ball 73.69 2.30 73.81 68.82 78.73 -0.13 2.42
game_average.positioning.percent_infront_ball 26.31 2.30 26.19 21.27 31.18 0.13 2.42
game_average.demo.inflicted 0.83 0.33 0.79 0.30 1.74 0.45 2.98
game_average.demo.taken 0.86 0.22 0.82 0.33 1.38 0.66 3.28

At a quick glance, each of these variables looks like a candidate for use in the ARC and further application of social scientific research methods. All skewness and kurtosis values fall well within normal limits and show fairly normal distributions. The means and standard deviations appear to have enough magnitude to bring sensitivity. If we take a graphical look at the distributions, we can see fairly healthy distributions that meet most of the common assumptions necessary for further statistical exploration:

selectedData %>% gather() %>%
  ggplot(aes(value)) + 
    facet_wrap(~ key, scales = "free", ncol = 3, shrink = FALSE) +
    geom_histogram( aes(y=..density..), colour="black", fill="white") +
    geom_density(alpha=.2, fill="#FF6666")

Relationships

Let’s take a look at how these variables relate to one another through a correlation matrix:

corrplot(cor(selectedData), tl.cex = 0.6)

Most of the stronger relationships (|r| > .5) exist between two inherently connected variables. For example the relationship between game_average.boost.bpm and game_average.boost.amount_collected is obvious; the more boost a player consumes the more the player needs to collect. Similarly, the strong inverse relationship between game_average.positioning.defensive_third and game_average.positioning.offensive_third exists because the two variables both measure essentially the same thing; positioning.

However, some of the moderate relationships give us an additional layer of insight into player performance. For example the relationships between game_average.boost.avg_amount and the game_average.boost.*.* variables show us that players who hold a higher amount of boost on average tend to hold boost amounts between 50 and 100. Some surprising sets of relationships come from the game_average.boost.amount_stolen variable. We can see that players who steal more boost also spend more time in the offensive third, spend more time at supersonic speed, spend less time at slow speed, and have overall higher speeds and travel further distances.

Indeed there appear to be a few potential factors among these relationships. Enough variables track together to justify some exploratory factor analysis. However, a previous analysis revealed that player positioning statistics are also related to win/loss status, conflating the variable’s values with an effect that should be controller for by this study. As such we will also eliminate another handful of these variables, leaving only those that demonstrate independence from game outcomes:

  selectedData <- data %>% select(all_of(
    c(
      "game_average.boost.bpm",
      "game_average.boost.avg_amount",
      "game_average.boost.amount_stolen",
      "game_average.boost.percent_zero_boost",
      "game_average.boost.percent_full_boost",
      "game_average.boost.amount_overfill",
      "game_average.boost.amount_overfill_stolen",
      "game_average.boost.amount_used_while_supersonic",
      "game_average.boost.percent_boost_25_50",
      "game_average.boost.percent_boost_50_75",
      "game_average.boost.percent_boost_75_100",
      "game_average.movement.avg_speed",
      "game_average.movement.total_distance",
      "game_average.movement.count_powerslide",
      "game_average.movement.avg_powerslide_duration",
      "game_average.movement.avg_speed_percentage",
      "game_average.movement.percent_slow_speed",
      "game_average.movement.percent_boost_speed",
      "game_average.movement.percent_supersonic_speed",
      "game_average.movement.percent_ground",
      "game_average.movement.percent_low_air",
      "game_average.movement.percent_high_air",
      "game_average.demo.inflicted",
      "game_average.demo.taken"
    )
  ))

Another look at the correlation values:

corrplot(cor(selectedData), tl.cex = 0.6)

Factors

To consolidate this still fairly large variable set even further into a few salient variables fit for human consumption we will use standard exploratory factor analysis (EFA) methods. To ensure the data is appropriate for EFA we first apply the Kaiser-Meye-Olkin (KMO) measure of sampling adequacy:

KMOresults <-KMO(selectedData)
KMOresults$MSA
## [1] 0.6690104

According to Kaiser (1974), an MSA value above .60 indicates factorability. The total KMO for the data is .7, comfortably above the cutoff but we can prune the last few variables that do not appear to contribute to potential factors:

selectedDataFiltered <- selectedData[, KMO(cor(selectedData))$MSAi>0.60]
selectedData <- selectedDataFiltered
KMOresults <-KMO(selectedData)
KMOresults$MSA
## [1] 0.7469851

This brings the KMO value up to .75 - a bit higher. Let’s take a look at the MSA values of each individual variable:

kable(as.data.frame(KMOresults$MSAi), col.names = c("MSA"))
MSA
game_average.boost.bpm 0.7227388
game_average.boost.avg_amount 0.6374632
game_average.boost.amount_stolen 0.7890482
game_average.boost.percent_zero_boost 0.6833374
game_average.boost.percent_full_boost 0.5358549
game_average.boost.amount_overfill 0.7553238
game_average.boost.amount_overfill_stolen 0.7245455
game_average.boost.percent_boost_50_75 0.7033235
game_average.boost.percent_boost_75_100 0.5453206
game_average.movement.avg_speed 0.7803817
game_average.movement.total_distance 0.8841638
game_average.movement.count_powerslide 0.5448400
game_average.movement.avg_powerslide_duration 0.7283263
game_average.movement.avg_speed_percentage 0.7803830
game_average.movement.percent_slow_speed 0.8626720
game_average.movement.percent_supersonic_speed 0.8838672
game_average.demo.inflicted 0.9011852

Bartlett’s Test of Sphericity is within the normal limits:

cortest.bartlett(selectedData)
## R was not square, finding R from data
## $chisq
## [1] 2008.93
## 
## $p.value
## [1] 0
## 
## $df
## [1] 136

Using a Scree Plot we can see that there are one or two potential factors in the set:

scree(selectedData, pc=FALSE)

Let’s extract two factors:

fit <- factanal(selectedData, 2, nstart = 5, rotation = "varimax", lower = 0.01)

print(fit, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = selectedData, factors = 2, rotation = "varimax",     nstart = 5, lower = 0.01)
## 
## Uniquenesses:
##                         game_average.boost.bpm 
##                                           0.51 
##                  game_average.boost.avg_amount 
##                                           0.03 
##               game_average.boost.amount_stolen 
##                                           0.51 
##          game_average.boost.percent_zero_boost 
##                                           0.57 
##          game_average.boost.percent_full_boost 
##                                           0.63 
##             game_average.boost.amount_overfill 
##                                           0.33 
##      game_average.boost.amount_overfill_stolen 
##                                           0.40 
##         game_average.boost.percent_boost_50_75 
##                                           0.47 
##        game_average.boost.percent_boost_75_100 
##                                           0.38 
##                game_average.movement.avg_speed 
##                                           0.01 
##           game_average.movement.total_distance 
##                                           0.27 
##         game_average.movement.count_powerslide 
##                                           0.81 
##  game_average.movement.avg_powerslide_duration 
##                                           0.77 
##     game_average.movement.avg_speed_percentage 
##                                           0.01 
##       game_average.movement.percent_slow_speed 
##                                           0.05 
## game_average.movement.percent_supersonic_speed 
##                                           0.19 
##                    game_average.demo.inflicted 
##                                           0.79 
## 
## Loadings:
##                                                Factor1 Factor2
## game_average.boost.bpm                          0.69          
## game_average.boost.amount_stolen                0.66          
## game_average.movement.avg_speed                 0.98          
## game_average.movement.total_distance            0.81          
## game_average.movement.avg_speed_percentage      0.98          
## game_average.movement.percent_slow_speed       -0.97          
## game_average.movement.percent_supersonic_speed  0.89          
## game_average.boost.avg_amount                           0.98  
## game_average.boost.percent_zero_boost                  -0.65  
## game_average.boost.percent_full_boost                   0.61  
## game_average.boost.amount_overfill              0.40    0.71  
## game_average.boost.amount_overfill_stolen       0.41    0.66  
## game_average.boost.percent_boost_50_75                  0.72  
## game_average.boost.percent_boost_75_100                 0.79  
## game_average.movement.count_powerslide          0.41          
## game_average.movement.avg_powerslide_duration  -0.45          
## game_average.demo.inflicted                     0.35          
## 
##                Factor1 Factor2
## SS loadings       6.06    4.20
## Proportion Var    0.36    0.25
## Cumulative Var    0.36    0.60
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 1195.64 on 103 degrees of freedom.
## The p-value is 1.94e-185