The data for this post can be found here and is as of February 2016.

The data includes fights from UFC 1 (11/12/1993) up to UFC Fight Night 83 (2/21/2016).

library(data.table)

Read fighters and fights data:

fighters <- fread("ALL_UFC_FIGHTERS.csv")
fights <- fread("ALL_UFC_FIGHTS.csv")

Set key on fighters data table and remove duplicate entries:

setkey(fighters, fid)
fighters <- unique(fighters)

Set key on fights data:

setkey(fights, f1fid)

Merge fight data onto fighters data.

For each fighter, the number of wins is equal to the number of times he is fighter1 and the result for fighter1 is a win.

Similarly, we store the number of times he fought as fighter1.

merged <- fights[fighters, .(name, class, wins=sum(f1result=="win"), fights1=.N), by=.EACHI]

Now change the key of fights data so we can count the # of times each fighter was fighter2.

setkey(fights, f2fid)

In the same step, let’s also compute the total number of fights for each fighter, and their win perentage.

merged2 <- fights[merged, .(name, class, wins, fights1, fights2=.N), by=.EACHI][
                  is.na(wins), wins := 0][
                  ,`:=`(fights = fights1 + fights2,
                        win_perc =wins/(fights1 + fights2))]

Some fighters have very few fights, and have a win percentage of 0 or 1, which we want to adjust.

We will do so using bayes credibility and a beta conjugate prior. See here for a detailed example.

The idea is to get a prior beta distribution and update it using the observed outcomes to get an adjusted posterior beta distribution. The expected value of this posterior beta distribution will be our adjusted win percentage (for each fighter).

In order to estimate the parameters of the prior distribution, we will only keep samples where the win percentage is strictly greater than 0 and less than 1. Also, we will assume that the prior expected value is 0.5, and use the method of moments to derive tha values of alpha and beta.

merged2_sample <- merged2[win_perc > 0 & win_perc < 1]

hist(merged2_sample$win_perc, breaks=10, col="pink")

m1 = 0.5
m2 = sum(merged2_sample$win_perc ^ 2)/length(merged2_sample$win_perc)
alpha0 = (m1*(m1-m2))/(m2 - m1^2)
beta0 = ((1-m1)*(m1-m2))/(m2-m1^2)

We can now compute our adjusted win percentage:

merged2[,win_perc_adj := (wins + alpha0) / (fights + alpha0 + beta0)]

And get our top10 list of fighters:

merged2[order(-win_perc_adj), .(name, class, fights, wins, win_perc_adj)][1:10]
##                    name             class fights wins win_perc_adj
##  1:           Jon Jones Light Heavyweight     16   15    0.8142700
##  2:  Georges St. Pierre      Welterweight     21   19    0.8116540
##  3:      Conor McGregor     Featherweight      7    7    0.7636766
##  4:         Yoel Romero      Middleweight      7    7    0.7636766
##  5:       Tony Ferguson       Lightweight     11   10    0.7605096
##  6:      Anderson Silva      Middleweight     19   16    0.7571829
##  7:            Don Frye       Heavyweight     10    9    0.7457933
##  8:       Chris Weidman      Middleweight     10    9    0.7457933
##  9: Khabib Nurmagomedov       Lightweight      6    6    0.7444223
## 10:        Royce Gracie      Middleweight     13   11    0.7334771

Update: The list above could be considered a list of top-10 fighters pound for pound, which explains why maybe your favorite fighter doesn’t show up there.

Let’s make top-3 lists by weight class:

merged2[order(-win_perc_adj), .(name, class, fights, wins, win_perc_adj)][,head(.SD,3), by=class]
##                 class                name fights wins win_perc_adj
##  1: Light Heavyweight           Jon Jones     16   15    0.8142700
##  2: Light Heavyweight      Daniel Cormier      7    6    0.6883404
##  3: Light Heavyweight        Rashad Evans     19   14    0.6780497
##  4:      Welterweight  Georges St. Pierre     21   19    0.8116540
##  5:      Welterweight    Stephen Thompson      8    7    0.7101747
##  6:      Welterweight       Warlley Alves      4    4    0.6946692
##  7:     Featherweight      Conor McGregor      7    7    0.7636766
##  8:     Featherweight           Jose Aldo      8    7    0.7101747
##  9:     Featherweight        Max Holloway     14   11    0.6972986
## 10:      Middleweight         Yoel Romero      7    7    0.7636766
## 11:      Middleweight      Anderson Silva     19   16    0.7571829
## 12:      Middleweight       Chris Weidman     10    9    0.7457933
## 13:       Lightweight       Tony Ferguson     11   10    0.7605096
## 14:       Lightweight Khabib Nurmagomedov      6    6    0.7444223
## 15:       Lightweight      Donald Cerrone     20   16    0.7283640
## 16:       Heavyweight            Don Frye     10    9    0.7457933
## 17:       Heavyweight      Cain Velasquez     13   11    0.7334771
## 18:       Heavyweight   Junior dos Santos     14   11    0.6972986
## 19:         Flyweight    Joseph Benavidez     13   11    0.7334771
## 20:         Flyweight  Demetrious Johnson     13   11    0.7334771
## 21:         Flyweight        Henry Cejudo      4    4    0.6946692
## 22:       Strawweight  Joanna Jedrzejczyk      5    5    0.7217523
## 23:       Strawweight        Tecia Torres      3    3    0.6617453
## 24:       Strawweight  Valerie Letourneau      4    3    0.5973346
## 25:      Bantamweight    Raphael Assuncao      8    7    0.7101747
## 26:      Bantamweight       Dominick Cruz      4    4    0.6946692
## 27:      Bantamweight   Aljamain Sterling      4    4    0.6946692
## 28: Super Heavyweight            Jon Hess      1    1    0.5687395
## 29: Super Heavyweight       Andre Roberts      3    2    0.5539151
## 30: Super Heavyweight      Scott Ferrozzo      5    3    0.5443505
## 31:               N/A    Onassis Parungao      1    1    0.5687395
## 32:               N/A       Marcelo Mello      1    1    0.5687395
## 33:               N/A          Sione Latu      1    1    0.5687395
## 34:        Atomweight   Michelle Waterson      1    1    0.5687395
##                 class                name fights wins win_perc_adj