Path of Exile is an action RPG game whose premise revolves around the player killing enemies using a preset character with an extensive amount of customization. The killed enemies generate loot and other bonuses which the player can use to augment and upgrade their character. Path of Exile is an online game, which means that players can interact with one another, trading each items and/or playing in the same instance as one another.
BPL is a quarterly community event that coincides with each major update that Path of Exile receives. Around 2000 players is placed inside an isolated and newly created instance (referred to as a “league”) where they must work together with their designated team to acquire items and accomplish objectives(“bounties”) defined by the organizers. Doing so accumulates points towards their team total, and at the end the teams are ranked based on how many points they have acquired over the four-day event. There are 4 teams for BPL 10.
Each preset character has a specific upgrade called an “ascendancy”. This allows the character to gain unique powers that are inaccesible to other presets. Each character is permitted only one ascendancy. To balance out the teams, the organizers have made it so that each team can only have a subset of the presets. One of the ascendancies, “Saboteur”, is banned.
One thing to note is that ascendancy bonuses are acquired in portions. To be fully ascended is to grab all possible upgrades related to that ascendancy.
Each character also has a stage-based development referred to as “levels”. Characters will start off weak. As they kill enemies they acquire numerical reward (“experience points”). When they have acquired sufficient experience points, they will “level up”, and achieve the next stage of their character progression. In Path of Exile, a character can have levels from 1 to 100, inclusive of both ends. Typically, having more levels mean that your character becomes stronger, as they get bonuses to customize their character with each level.
The playthrough of a character is broken down into two parts: campaign, where the character goes through a story-driven progression, and endgame, where the player can tackle various content unavailable in campaign. All of BPL’s bounties occur in the endgame.
A large portion of the endgame takes place in the Atlas, where players can enter special areas to fight harder enemies and acquire more rewarding loot. Completing sections of the Atlas awards players with the power to customize the Atlas in the form of “atlas passives”. Note that is it not required that a player needs all atlas passives to participate in the endgame, but having more gives the player more options in their playstyle.
The data is broken down into two csvs. bpl-10_teams.csv contains information at the end of the event. It contains finalized data of all the players, including level, whether or not a player has fully ascended, and the number of atlas passives they have acquired. bpl-10_level_data.csv contains progressive tracking of every single character’s levelling over the course of the event. Snapshots of each character are taken at approximately every 15 minutes. Each entry contains the player, their team name, their character level, and time of snapshot.
For anonymity and ethics, player names are replaced with hashed values. Special thanks to Moowiz for providing the data.
There’s a lot of cleaning required for the bpl-10_level_data.csv dataset. We’ll come back to it when we start actually analyzing. For now let’s look at bpl-10_teams.csv.
For each player, bpl-10_teams.csv contains the level, whether or not the player has fully ascended their character, and the number of atlas passives that they’ve allocated.
As a brief summary of the endgame, players will progress their character’s Atlas, which takes the form of large number of repeatable zones with randomized rewards and difficulty know as maps. Players can customize their Atlas with Atlas passives to acquire specific loot or fight specific monsters. Every completion of an unique map will yield the player 1 Atlas passive. To simplify, a player’s allocated Atlas passive count is the same as the player’s Atlas completion*. This makes it easy to assess a player’s progression in the endgame, as certain bonuses or content in the Atlas are locked behind a certain amount of Atlas passives. The Atlas is divided into 3 regions: white, yellow, and red in progressive difficulty.
*There are 100 maps in the Atlas. 32 attribute to additional challenges a player can undertake.
The graph below is the final tally of everyone’s achievements on the last day.
print(paste("The number of players above level 69 is", round(100 * nrow(teams[teams$level >= 69,])/nrow(teams), 2),"%"))
## [1] "The number of players above level 69 is 83.92 %"
hist(teams$level, main="Distribution of Player Levels", xlab="Player Level", col="magenta")
As we can see, most of the players reside on the rightside tail. All the bounties in BPL are meant to be completed in latter portion of the endgame, starting at level 69, so our focus lies in only the players who exceed level 69.
endgame_team = teams[teams$level >= 69,]
h <- hist(endgame_team$level, xlim=c(69,100), main="Distribution of Endgame Levels",xlab="Player level", col='blue', breaks=c(69,79, 89, 100))
We can also look at Atlas Completion as a way to distribute the endgame players. Once again, we assume level 69 and above, because players not above that cannot access endgame.
h <- hist(endgame_team$num_atlas_passives_allocated, xlim=c(0,132), main="Atlas Completion Distribution ",xlab="Atlas Completion", col='red')
text(h$mids, h$counts, labels = h$counts)
We can see the distribution is once again focused on the right end of the histogram, with the exception of 0-10 completion, where there are a lot of players. One can assume that certain players got to endgame and simply did not progress any further. We can see that there’s a sharp jump at 70-80. The peak frequency lies at 110-120. Only 5 are in the 130-132 bin, suggesting that the extra 3 points is something that only the completionists were willing to get to.
The peak at 70-80 and the peak at 110-120 are what I want to look into. They seem to imply certain breakpoints or thresholds that players have to cross while completing their atlas. Assuming that they represent the cutoffs between players in white, yellow, and red maps of the Atlas, we get something like the following:
breaks <- hist(endgame_team$num_atlas_passives_allocated, plot=FALSE)$breaks
break_colors <- rep("grey", length(breaks))
break_colors[breaks >= 70] <- "yellow"
break_colors[breaks >= 110] <- "red"
hist(endgame_team$num_atlas_passives_allocated, breaks = breaks, col = break_colors, main = "Atlas Completion with Breaks", xlab = "Atlas Completion")
There is one more characteristic of the bpl-10_teams.csv that we haven’t looked at, and that is fully_ascended. Each character can acquire a subclass called an ascendancy, which increases the player’s power. One thing to note is that the ascendancy is unlocked piece by piece. 3 of the 4 “trials” to acquire ascendancy power occur during the campaign, while the last one occurs during endgame, somewhere between yellow and early red maps. A fully ascended character represents much of the power that a character can have.
endgame_team_fully <- endgame_team[endgame_team$fully_ascended == TRUE, ]
h <- hist(endgame_team_fully$level, xlim=c(69,100), main="Distribution of Fully Ascended Character levels",xlab="Player level", col='blue', breaks=c(69,79, 89, 100))
breaks <- hist(endgame_team_fully$num_atlas_passives_allocated, plot = FALSE)$breaks
break_colors <- rep("grey", length(breaks))
break_colors[breaks >= 70] <- "yellow"
break_colors[breaks >= 110] <- "red"
h <- hist(endgame_team_fully$num_atlas_passives_allocated, breaks = breaks, col = break_colors, main="Atlas Completion of Fully Ascended Characters", xlab="Atlas Completion")
text(h$mids, h$counts, labels = h$counts)
It is hard to determine whether or not these divisions I have set in player level or atlas passive count are accurate representation of actual player progression. But one thing is clear: there is an increase in player power with the accumulation of levels and Atlas progression. These two serve as a good metric for assessing how much a player might contribute to their team score.
Being an adjacent member of the BPL organizers, one of the questions they often have to deal with is the problem of good players. Most players are sorted randomly in the event, but for a certain select few whose skills are well-known, they have be drafted in specifically to avoid imbalance among the teams. Colloquially, these players are known as “pushers”, as they rush through the initial content very quickly.
It does pose the question, can we identify such members? Typically such players are known by anecdotal experience and reputation, but it is possible to discern such players through modeling a player’s progression?
One thing that I never mentioned is time. After all, the graphs above are made with data from after the event. It doesn’t capture a player’s progress during any part of the event. Some players, due to personal schedule, might not play at certain hours, and hence might not progress their characters during that timeframe.
bpl-10_level_data.csv contains 15 minute snapshots of every single character over the course of 4 days. Of course, there are some problems with the data. For instance, the snapshots do not occur 15 minutes on the dot. There’s also the possibility that not everyone signed up to play at the start, meaning that each “instance” will differ in player count. Additionally system failure may have ensured that certain snapshots were not recorded.
#First, convert to epoch, then to a reasonable "bin" number
levels$time_fetched <- as.integer(as.POSIXct(levels$time_fetched))
levels$time_fetched <- replace(levels$time_fetched, is.na(levels$time_fetched), 0)
roundto15 <- function(a) {
offset = a %% 900
a = (a - offset - 1673668800) / 900
}
levels$time_fetched <- roundto15(levels$time_fetched)
levels <- levels[complete.cases(levels), ]
levels <- levels[!levels$team == "NULL",]
levels <-reshape(levels, idvar=c("char_name_hash", "team"), timevar="time_fetched", direction="wide")
levels[is.na(levels)] <- 0
Our question revolves around finding out if we can identify these “pushers” through the use of a model. Since we are performing a binary classification, I propose using a logistical regression model. However, this does require me to classify each observation. I’ll make assumptions in the following subsection.
One thing that I’ll be utilizing for this model is K-folds cross validation. I’ll use the standard 10 folds.
One thing that I should clear up concerning the data is that the predictor user_id in bpl-10_teams.csv is not the same as char_name_hash in bpl-10_level_data.csv. user_id refers the the player’s account name, whereas char_name_hash refers to the name of the character that the player created for the event. Unfortunately, I was not able to get inner columns that allow me to join the two.
But I can use the teams dataset as a baseline to get an expectation of how many Pushers I should expect to be present. The assumption I make is that Pushers at the end have 1) fully ascended their character, 2) levelled to at least 95, and 3) have completed 110/132 Atlas passives*. My theory is that the pushers, assuming either extreme early level “pushing”, or consistent playtime, would naturally have these three characteristics at the end.
*100 for map completion, and another 10 from quests following Atlas progression.
pusher_estim <- endgame_team_fully[endgame_team_fully$level >= 95 & endgame_team_fully$num_atlas_passives_allocated >= 110, ]
print(paste("The number of pushers is expected to be roughly at most", nrow(pusher_estim), "out of", nrow(endgame_team), "players who have achieved endgame."))
## [1] "The number of pushers is expected to be roughly at most 185 out of 1540 players who have achieved endgame."
We need to classify each observation as either a Pusher or not. The above gives us a rough estimate to work with, but that estimate comes at the end of the event. Since Pushers are proficient at the game, they must naturally get to a certain level threshold quicker than other players. Hence, one way to identify them is to find those who have levelled quickly. But quickly pushing isn’t sufficient enough, as bounties typically require more playtime to grind out results. So not only do we have to ensure that what we identify as a Pusher acquires levels very quickly, we also need evidence of progression past a certain point.
levels <- transform(levels, pusher = as.factor(ifelse((char_level.61 >= 90 & char_level.380 >= 95), TRUE, FALSE)))
hours <- 1:187
hours <- hours[(hours %% 4) == 1]
hours <- hours + 2
levels <- levels[,c(hours, 377)]
pie(table(levels$pusher), main = "Portion of Pushers", col=c("red", "blue"))
My main concern is that we are forced to use predictors of levels to classify pushers. Hence, when performing logistic regression, our model might not converge because the two categories are well separated. While that is a possibility, I’d like to avoid that unless necessary, so what I did was use two predictors to classify pushers, then subset the predictors so we are only doing measurements by the hour rather than 15 minutes (the amount one can accomplish in 15 minutes in POE is limited). We also only accounted for levels in the first two days, because realistically speakingThis way we get a better delta difference between any adjacent predictors, and we also don’t have the identifying predictors in the dataset.
We are doing a logistical regression using K-folds cross validation.
set.seed(0)
train.control <-trainControl(method = "cv", number = 10)
model <- train(form = pusher ~ ., data = levels, method = 'glmnet', tuneGrid = expand.grid(alpha = 0, lambda = 1), trControl = train.control, family = 'binomial')
summary(model)
## Length Class Mode
## a0 100 -none- numeric
## beta 4700 dgCMatrix S4
## df 100 -none- numeric
## dim 2 -none- numeric
## lambda 100 -none- numeric
## dev.ratio 100 -none- numeric
## nulldev 1 -none- numeric
## npasses 1 -none- numeric
## jerr 1 -none- numeric
## offset 1 -none- logical
## classnames 2 -none- character
## call 5 -none- call
## nobs 1 -none- numeric
## lambdaOpt 1 -none- numeric
## xNames 47 -none- character
## problemType 1 -none- character
## tuneValue 2 data.frame list
## obsLevels 2 -none- character
## param 1 -none- list
Instead of that, I thought to look at atlas completion by the end of the event. We can go back to teams. While it does not offer time-sensitive data like levels, I’d like to observe the significance the characteristics and how they might help players progress through the game quickly. I am going to introduce a new data set. **bpl-1-_ascendancy.csv** marks what each player chose as their ascendancy as well as the team they were on. While I am not certain as to how their team might influence their Atlas completion, what ascendancy they chose may have a huge impact on their it.
pie(table(ascendancies$character_class), main = "Ascendancy Breakdown", cex=0.5)
We can see that a large portion of the playerbase are playing Occultist, Elementalist and Pathfinder. We can also see unascended characters (Witch, Templar, Scion, Shadow, Marauder, Ranger, Duelist) making up a non-trivial amount. We’ll need more information.
Both ascendancy and teams have the same column user_id, so I am going to join them.
combined <- merge(x=ascendancies, y=teams, by = "user_id")
#combined <- transform(combined, highlevel = as.factor(ifelse(level >= 95, TRUE, FALSE)))
combined <- combined[,-1]
combined$team_id <- as.factor(combined$team_id)
#print(nrow(combined[combined$highlevel == TRUE,]))
#print(nrow(combined[combined$highlevel == FALSE,]))
The plan is to split the set 70:30, then find a linear model.
set.seed(0)
sample <- sample(c(TRUE, FALSE), nrow(combined), replace = TRUE, prob = c(0.7, 0.3))
train <- combined[sample, ]
test <- combined[!sample, ]
model2 <- lm(num_atlas_passives_allocated ~ ., data = train)
summary(model2)
##
## Call:
## lm(formula = num_atlas_passives_allocated ~ ., data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.780 -17.400 4.214 21.956 64.978
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -52.17070 5.40558 -9.651 < 2e-16 ***
## character_classAssassin 13.81673 9.09635 1.519 0.129005
## character_classBerserker -13.56630 7.59520 -1.786 0.074288 .
## character_classChampion 12.19993 5.25346 2.322 0.020361 *
## character_classChieftain -19.68220 9.57057 -2.057 0.039915 *
## character_classDeadeye 16.19977 5.41594 2.991 0.002828 **
## character_classDuelist 19.64264 5.56722 3.528 0.000432 ***
## character_classElementalist 14.25798 3.92364 3.634 0.000289 ***
## character_classGladiator -4.40140 7.22617 -0.609 0.542562
## character_classGuardian 10.29668 5.24774 1.962 0.049946 *
## character_classHierophant 1.58269 6.85480 0.231 0.817436
## character_classInquisitor 6.68115 4.31207 1.549 0.121509
## character_classJuggernaut 5.16293 4.48614 1.151 0.249985
## character_classMarauder 37.26567 10.90225 3.418 0.000648 ***
## character_classNecromancer -0.95111 5.15960 -0.184 0.853775
## character_classOccultist 9.10576 4.02445 2.263 0.023812 *
## character_classPathfinder -3.78093 4.17053 -0.907 0.364783
## character_classRaider -5.83925 4.32428 -1.350 0.177124
## character_classRanger 20.82786 4.61074 4.517 6.79e-06 ***
## character_classScion 12.47759 4.84407 2.576 0.010101 *
## character_classShadow 8.12971 12.08456 0.673 0.501226
## character_classSlayer 8.43292 4.87976 1.728 0.084183 .
## character_classTemplar 10.08420 6.74741 1.495 0.135262
## character_classTrickster 9.57383 4.83538 1.980 0.047904 *
## character_classWitch 19.43400 5.88572 3.302 0.000984 ***
## team_id171 -3.12616 2.32369 -1.345 0.178732
## team_id172 1.16619 2.59686 0.449 0.653446
## team_id173 -2.56683 2.58077 -0.995 0.320102
## level 0.99088 0.06039 16.409 < 2e-16 ***
## fully_ascendedTRUE 41.15580 2.55845 16.086 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.15 on 1405 degrees of freedom
## Multiple R-squared: 0.6113, Adjusted R-squared: 0.6033
## F-statistic: 76.19 on 29 and 1405 DF, p-value: < 2.2e-16
Interestingly enough, Duelist, Marauder, and Witch are considered significant predictors. Since those characters are unascended, they lack a significant portion of their abilities, which is interesting as those players still have a positive contribution towards their Atlas progression.
The coefficient for level is significant, and is 0.99088. The significance was expected, but the coefficient suggests that levels are roughly 1 for 1 Atlas completion, which, given the mismatched scaling of zone levels and player levels do not make much sense.
I am not sure what to think of the intercept. It being negative means that for a level 1 base class character, I have negative Atlas completion. It’s still significant given its P-value though.
Full ascension is significant and its coefficient is 41.15580, which suggests that players gain a huge increase in Atlas completion simply by fully ascending. Or it could infer that completing a portion of the Atlas yields higher chances of getting fully ascended.
Teams did not have significance. I certainly hope that team placement was significant to Atlas progression.
Finally, the model has an R-squared of 0.6113. I’d prefer it if more of the data’s variability can be explained by , but I think the current data does not possess enough comprehensive predictors to make a more accuracy assessment.
prediction <- predict(model2, newdata = test, interval= "confidence", type = "response")
head(prediction)
## fit lwr upr
## 1 87.13742 80.58679 93.68806
## 5 83.29419 76.45467 90.13372
## 7 74.16889 61.21733 87.12045
## 8 75.78546 70.40716 81.16377
## 16 31.54804 22.66221 40.43388
## 18 33.55102 25.03805 42.06399
When we look at models and the information we can use to apply to real life. To be honest, I feel like the applications of this particular model are not very useful, other than to validate players of their skills for hitting certain benchmarks. And even then, I do not think this is a qualifying metric for determining skill.
In a sense, this is akin to modelling any sort of group-based activity. Change the parameters around and this could easily be a company modelling the efficiency of their employees. I suppose this experience is akin to HR attempting to implement such a measure of productivity. I can also see the pitfalls of such a model, as many assumptions are made. If any assumptions that were made were found to be false, then this whole model falls apart. I feel that this story is incomplete. There are a lot of limitations in dealing with this data (from the classification variable being too separate to not being able to link teams and levels), and Should I come back to this data (or acquire the data for the next event), I would probably try to acquire the missing links and form a more comprehensive model that may answer, or at least confirm some of the assumptions I made for this investigation.
In conclusion, I suck at the game. Just kidding, there’s a lot of nuance into the performance of players in Path of Exile. Regardless, this has been quite an exploration. I’ve personally been a part of 5 BPLs now, and it’s always fascinated me the breakdown and statistics of the playerbase in this community event. I would like to play more with data from future events.
Following the mention of ridge regression for the model training, I decided to try it again. I had tried to fit it with lasso regression with no success and had somehow missed trying it with ridge. I create a test dataset out of the original to test against this model.
update_test <- createDataPartition(levels$pusher, p=0.6, list=FALSE)
update_training <- levels[update_test,]
update_testing <- levels[-update_test,]
model_update <- train(form = pusher ~ ., data = update_training, method = 'glmnet', tuneGrid = expand.grid(alpha = 0, lambda = 1), trControl = train.control, family = 'binomial')
pred <- predict(model_update, newdata=update_testing)
confusionMatrix(pred, update_testing$pusher)
## Confusion Matrix and Statistics
##
## Reference
## Prediction FALSE TRUE
## FALSE 857 18
## TRUE 0 0
##
## Accuracy : 0.9794
## 95% CI : (0.9677, 0.9878)
## No Information Rate : 0.9794
## P-Value [Acc > NIR] : 0.5622
##
## Kappa : 0
##
## Mcnemar's Test P-Value : 6.151e-05
##
## Sensitivity : 1.0000
## Specificity : 0.0000
## Pos Pred Value : 0.9794
## Neg Pred Value : NaN
## Prevalence : 0.9794
## Detection Rate : 0.9794
## Detection Prevalence : 1.0000
## Balanced Accuracy : 0.5000
##
## 'Positive' Class : FALSE
##
We have a working model with 0.9794 accuracy. However, looking at the confusion matrix, we note that only true positive and false positives are being calculated by the model, suggesting that the model is really just identifying everyone as a non-pusher. I do not believe this would be an accurate model if everyone were to be labelled as non-pushers.
Comments on the Model
Unfortunately what I suspected came to pass, which was that the logistic regression did not converge. Unfortunately, it seems that the variables were too well separated. I suspect that given there is such a small population of pushers, once you identify a characteristic about the pushers, you’ve essentially nailed it. Plus the categories are increments of time. Once you get to a certain point you’ve crossed the threshold into being a pusher.