The data for this post can be found here and is as of February 2016.
The data includes fights from UFC 1 (11/12/1993) up to UFC Fight Night 83 (2/21/2016).
library(data.table)
Read fighters and fights data:
fighters <- fread("ALL_UFC_FIGHTERS.csv")
fights <- fread("ALL_UFC_FIGHTS.csv")
Set key on fighters data table and remove duplicate entries:
setkey(fighters, fid)
fighters <- unique(fighters)
Set key on fights data:
setkey(fights, f1fid)
Merge fight data onto fighters data.
For each fighter, the number of wins is equal to the number of times he is fighter1 and the result for fighter1 is a win.
Similarly, we store the number of times he fought as fighter1.
merged <- fights[fighters, .(name, class, wins=sum(f1result=="win"), fights1=.N), by=.EACHI]
Now change the key of fights data so we can count the # of times each fighter was fighter2.
setkey(fights, f2fid)
In the same step, let’s also compute the total number of fights for each fighter, and their win perentage.
merged2 <- fights[merged, .(name, class, wins, fights1, fights2=.N), by=.EACHI][
is.na(wins), wins := 0][
,`:=`(fights = fights1 + fights2,
win_perc =wins/(fights1 + fights2))]
Some fighters have very few fights, and have a win percentage of 0 or 1, which we want to adjust.
We will do so using bayes credibility and a beta conjugate prior. See here for a detailed example.
The idea is to get a prior beta distribution and update it using the observed outcomes to get an adjusted posterior beta distribution. The expected value of this posterior beta distribution will be our adjusted win percentage (for each fighter).
In order to estimate the parameters of the prior distribution, we will only keep samples where the win percentage is strictly greater than 0 and less than 1. Also, we will assume that the prior expected value is 0.5, and use the method of moments to derive tha values of alpha and beta.
merged2_sample <- merged2[win_perc > 0 & win_perc < 1]
hist(merged2_sample$win_perc, breaks=10, col="pink")
m1 = 0.5
m2 = sum(merged2_sample$win_perc ^ 2)/length(merged2_sample$win_perc)
alpha0 = (m1*(m1-m2))/(m2 - m1^2)
beta0 = ((1-m1)*(m1-m2))/(m2-m1^2)
We can now compute our adjusted win percentage:
merged2[,win_perc_adj := (wins + alpha0) / (fights + alpha0 + beta0)]
And get our top10 list of fighters:
merged2[order(-win_perc_adj), .(name, class, fights, wins, win_perc_adj)][1:10]
## name class fights wins win_perc_adj
## 1: Jon Jones Light Heavyweight 16 15 0.8142700
## 2: Georges St. Pierre Welterweight 21 19 0.8116540
## 3: Conor McGregor Featherweight 7 7 0.7636766
## 4: Yoel Romero Middleweight 7 7 0.7636766
## 5: Tony Ferguson Lightweight 11 10 0.7605096
## 6: Anderson Silva Middleweight 19 16 0.7571829
## 7: Don Frye Heavyweight 10 9 0.7457933
## 8: Chris Weidman Middleweight 10 9 0.7457933
## 9: Khabib Nurmagomedov Lightweight 6 6 0.7444223
## 10: Royce Gracie Middleweight 13 11 0.7334771
Update: The list above could be considered a list of top-10 fighters pound for pound, which explains why maybe your favorite fighter doesn’t show up there.
Let’s make top-3 lists by weight class:
merged2[order(-win_perc_adj), .(name, class, fights, wins, win_perc_adj)][,head(.SD,3), by=class]
## class name fights wins win_perc_adj
## 1: Light Heavyweight Jon Jones 16 15 0.8142700
## 2: Light Heavyweight Daniel Cormier 7 6 0.6883404
## 3: Light Heavyweight Rashad Evans 19 14 0.6780497
## 4: Welterweight Georges St. Pierre 21 19 0.8116540
## 5: Welterweight Stephen Thompson 8 7 0.7101747
## 6: Welterweight Warlley Alves 4 4 0.6946692
## 7: Featherweight Conor McGregor 7 7 0.7636766
## 8: Featherweight Jose Aldo 8 7 0.7101747
## 9: Featherweight Max Holloway 14 11 0.6972986
## 10: Middleweight Yoel Romero 7 7 0.7636766
## 11: Middleweight Anderson Silva 19 16 0.7571829
## 12: Middleweight Chris Weidman 10 9 0.7457933
## 13: Lightweight Tony Ferguson 11 10 0.7605096
## 14: Lightweight Khabib Nurmagomedov 6 6 0.7444223
## 15: Lightweight Donald Cerrone 20 16 0.7283640
## 16: Heavyweight Don Frye 10 9 0.7457933
## 17: Heavyweight Cain Velasquez 13 11 0.7334771
## 18: Heavyweight Junior dos Santos 14 11 0.6972986
## 19: Flyweight Joseph Benavidez 13 11 0.7334771
## 20: Flyweight Demetrious Johnson 13 11 0.7334771
## 21: Flyweight Henry Cejudo 4 4 0.6946692
## 22: Strawweight Joanna Jedrzejczyk 5 5 0.7217523
## 23: Strawweight Tecia Torres 3 3 0.6617453
## 24: Strawweight Valerie Letourneau 4 3 0.5973346
## 25: Bantamweight Raphael Assuncao 8 7 0.7101747
## 26: Bantamweight Dominick Cruz 4 4 0.6946692
## 27: Bantamweight Aljamain Sterling 4 4 0.6946692
## 28: Super Heavyweight Jon Hess 1 1 0.5687395
## 29: Super Heavyweight Andre Roberts 3 2 0.5539151
## 30: Super Heavyweight Scott Ferrozzo 5 3 0.5443505
## 31: N/A Onassis Parungao 1 1 0.5687395
## 32: N/A Marcelo Mello 1 1 0.5687395
## 33: N/A Sione Latu 1 1 0.5687395
## 34: Atomweight Michelle Waterson 1 1 0.5687395
## class name fights wins win_perc_adj