In the UFC, the welterweight division (170 lbs) is sometimes referred to as the welterwait division since the top ranked fighters compete against each other so infrequently. If that is the case, then there is a lot of potential for the rankings to be inaccurate and outdated. These are the rankings as of March 8th, 2022:
The objective will be to re-rank the fighters based on a few measures of centrality, examine what the centrality scores mean within the context of the fighter network, and observe any differences between the current rankings and the rankings I produce. To create my rankings list, I will take measurements of in-degree, out-degree, and eigenvector centrality. With these measurements, I will create three separate rankings lists. The sum of each fighter’s rank across these three rankings will be their final score. For example, if a fighter is ranked 1 in all three ranking lists, their final score will be 3. Inversely, if they rank 12 in every list, their final score will be 36.
Fighters will be ranked by their final score.
Since it is likely that there will be ties within the centrality rankings, I will be implementing some tie-breaking criteria. The order of tie-breaking criteria are:
For the purpose of cohesive notation in my analysis, Kamaru Usman will be ranked #1 instead of being listed as the champion. Everyone else’s rank n will be modified to n+1 to accommodate for this.
I created four intuitively named .csv files with data collected from the fighter’s individual Wikipedia pages:Fighter_nodes.csv is an attribute table, fighter_nodes2.csv is a copy of that attribute table excluding #12 Khamzat Chimaev (created for debugging purposes), fighter_edges.csv is an adjacency matrix of competition history, and fighter_wins.csv is a data table of which fighters have wins over the others.
# Packages
library(igraph)
# Reading in the data
nodes <- read.csv("fighter_nodes.csv", header = TRUE, as.is = T)
nodes2 <- read.csv("fighter_nodes2.csv", header = TRUE, as.is = T) # Excluding Chimaev
edges <- read.csv("fighter_edges.csv", header = TRUE, row.names = 1)
wins <- read.csv("fighter_wins.csv", header = TRUE)
# Graph object of which fighters in the top 12 have fought each other
top12edges <- as.matrix(edges)
fighter_edges <- graph_from_adjacency_matrix(top12edges, mode = "undirected")
# Graph object of fighters in the top 12 that have wins over each other
fighter_wins <- graph_from_data_frame(wins)
# Color objects
Vcolrs <- c(ifelse(nodes[V(fighter_edges), 2] == "1", "gold", "palegreen"))
Vlabelcolrs <- c(ifelse(nodes[V(fighter_edges), 2] == "1", "chocolate3", "black"))
Ecolrs <- c(ifelse(wins[E(fighter_wins), 3] == "2", "black", "gray50"))
Examining the entire competition history of my sample of fighters allows us to see who has fought who in the top 12 to get the ranking that they have. Nodes represent fighters and edges represent if they have fought before.
par(mfrow=c(1,2))
# Plot 1
set.seed(8)
plot(fighter_edges, vertex.label=nodes$Rank, vertex.size=30,
vertex.color=Vcolrs, vertex.label.color=Vlabelcolrs,
edge.width=2, sub="Rank notation")
legend (x=-1.5, y=-1.5, c("Champion", "Contender"), pch=21,
col="#777777", pt.bg=Vcolrs, pt.cex=1.25, cex=.8,
bty="o", ncol=1)
# Plot 2
set.seed(8)
plot(fighter_edges, vertex.size=30, vertex.color=Vcolrs,
vertex.label.color=Vlabelcolrs, edge.width=2,
sub="Name notation")
legend (x=-1.5, y=-1.5, c("Champion", "Contender"), pch=21,
col="#777777", pt.bg=Vcolrs, pt.cex=1.25, cex=.8,
bty="o", ncol=1)
# Title
mtext("Top 12 UFC Welterweights Competition History",
side = 3, line = -2.5, outer = TRUE, cex = 1.7)
A few interesting insights are that #2 Covington and #3 Burns have very high rankings despite having only fought one other person besides the champ, while most of the other contenders in the mix have been much more active. #9 Brady and #10 Magny only have one fight with another top 12 contender while #12 Chimaev has none. This isn’t too alarming though considering they are at the bottom of the rankings, and perhaps haven’t had a chance to prove themselves yet against the higher competition. Notably, it is difficult to surmise meaningful conclusions from viewing the competition history without knowing the results. Let’s do that now.
NOTE: The name notation plot above was used for the purpose of familiarizing the fighter names with my adjusted rank notation of n + 1. It is a bit crowded for the visualization, so moving forward the plots will only be in rank notation.
# Plot 1
set.seed(7)
plot(fighter_wins, edge.width=(E(fighter_wins)$Weight), edge.arrow.size = .6,
edge.color=Ecolrs, vertex.size=30, vertex.label=nodes2$Rank,
vertex.color=Vcolrs, vertex.label.color=Vlabelcolrs,
sub="Black edges indicate 2 victories")
legend (x=-2.5, y=0, c("Champion", "Contender"), pch=21,
col="#777777", pt.bg=Vcolrs, pt.cex=1.25, cex=.8,
bty="o", ncol=1)
# Title
mtext("Top 12 UFC Welterweight Victory Network",
side = 3, line = -1.8, outer = TRUE, cex = 1.8)
This plot presents a better view of the landscape of the welterweight division. 12 (Chimaev) is eliminated from this plot as it has no ties to any of the other nodes, and will remain visually unrepresented for the remainder of the plots for this purpose. The outgoing arrowheads represent a victory, and the incoming arrowheads represent a defeat. This visualization does a good job of highlighting the champion and their work in the division. Even just examined via the eye-test, it would seem that the champion has done well to earn their position.
The degree of a node is the the number of connections it has to other nodes. For directed networks such as fighter_wins (the above plot), there can be in-degree, out-degree, and total degree. In-degree is the number of lines being sent to a node, out-degree is the number of lines originating from a node toward some other node, and total degree is the sum of those. In the context of this network, the in-degree would be the number of losses to other top 12 fighters, the out-degree would be the number of wins, and the total degree is the total number of fights that a fighter has within the top 12.
# Degree
top12losses <- degree(fighter_wins, mode="in")
top12wins <- degree(fighter_wins, mode="out")
top12fights <- degree(fighter_wins, mode="total")
degdat <- data.frame(top12losses, top12wins, top12fights)
degdat
# Nodes weighted by in-degree
set.seed(7)
plot(fighter_wins, edge.width=(E(fighter_wins)$Weight), edge.arrow.size = .3,
edge.color=Ecolrs, vertex.label=nodes2$Rank, vertex.color=Vcolrs,
vertex.label.color=Vlabelcolrs, vertex.size = top12losses*20,
sub="WELTERWEIGHT VICTORY NETWORK")
legend (x=-2.5, y=-1.5, c("Champion", "Contender"), pch=21,
col="#777777", pt.bg=Vcolrs, pt.cex=1.25, cex=.8,
bty="o", ncol=1)
mtext("Weighted By Losses",
side = 3, line = -1.8, outer = TRUE, cex = 1.8)
# Nodes weighted by out-degree
set.seed(7)
plot(fighter_wins, edge.width=(E(fighter_wins)$Weight), edge.arrow.size = .3,
edge.color=Ecolrs, vertex.label=nodes2$Rank, vertex.color=Vcolrs,
vertex.label.color=Vlabelcolrs, vertex.size = top12wins*20,
sub="WELTERWEIGHT VICTORY NETWORK")
legend (x=-2.5, y=-1.5, c("Champion", "Contender"), pch=21,
col="#777777", pt.bg=Vcolrs, pt.cex=1.25, cex=.8,
bty="o", ncol=1)
mtext("Weighted By Wins",
side = 3, line = -1.8, outer = TRUE, cex = 1.8)
# Nodes weighted by total-degree
set.seed(7)
plot(fighter_wins, edge.width=(E(fighter_wins)$Weight), edge.arrow.size = .3,
edge.color=Ecolrs, vertex.label=nodes2$Rank, vertex.color=Vcolrs,
vertex.label.color=Vlabelcolrs, vertex.size = top12fights*15,
sub="WELTERWEIGHT VICTORY NETWORK")
legend (x=-2.5, y=-1.5, c("Champion", "Contender"), pch=21,
col="#777777", pt.bg=Vcolrs, pt.cex=1.25, cex=.8,
bty="o", ncol=1)
mtext("Weighted By Number Of Fights",
side = 3, line = -1.8, outer = TRUE, cex = 1.8)
# All plots side by side
par(mfrow=c(1,3))
set.seed(7)
plot(fighter_wins, edge.width=(E(fighter_wins)$Weight), edge.arrow.size = .3,
edge.color=Ecolrs, vertex.label=nodes2$Rank, vertex.color=Vcolrs,
vertex.label.color=Vlabelcolrs, vertex.size = top12losses*20, sub="LOSSES")
set.seed(7)
plot(fighter_wins, edge.width=(E(fighter_wins)$Weight), edge.arrow.size = .3,
edge.color=Ecolrs, vertex.label=nodes2$Rank, vertex.color=Vcolrs,
vertex.label.color=Vlabelcolrs, vertex.size = top12wins*20, sub="WINS")
set.seed(7)
plot(fighter_wins, edge.width=(E(fighter_wins)$Weight), edge.arrow.size = .3,
edge.color=Ecolrs, vertex.label=nodes2$Rank, vertex.color=Vcolrs,
vertex.label.color=Vlabelcolrs, vertex.size = top12fights*15,
sub="TOTAL FIGHTS")
mtext("Top 12 Welterweight Victory Network",
side = 3, line = -3, outer = TRUE, cex = 1.8)
Eigenvector centrality assigns relative scores to a node based on its connection to other highly or poorly connected nodes. Connections to highly connected nodes result in a higher score than connections to poorly connected nodes. In the context of this network, being connected to a node with many connections would be described as having a victory over a fighter who has many victories over other fighters within the top 12.
Unfortunately, this is where the igraph package has some limitations. In directed networks, there is an in-eigenvector and an out-eigenvector (similar to in-degree and out-degree). A limitation of igraph is that it can only measure in-eigenvector values. This would not be a helpful ranking criteria since it would be ranking the fighters based on their connections to poorly connected fighters in the network (ie. fighters who don’t have a lot of wins). A possible solution could be to use this ranking anyway and then reverse-rank the results, but it would end up in a very large tie across several fighters. This would mean that fighters would essentially be ranked by tie-breaking criterion which is not the objective.
Instead, I will be using undirected eigenvector centrality. While it doesn’t achieve exactly what I would like it to do, I think it does add an interesting element to rank by nonetheless. Rather than this being a score of the quality of fighters based on their wins over others, it is a score based on how “game” a fighter is. If a fighter has many fights within the top 12, and the opponents that they fight have many fights within the top 12, then their score would be expected to be very high. This can also be interpreted as how active a fighter is in the division, and if their opponents are active in the division as well.
top12ev <- eigen_centrality(fighter_wins)
top12ev
## $vector
## Usman Covington Burns Edwards Luque Muhammad Thompson Masvidal
## 0.8590015 0.5867373 0.5776282 0.5572704 0.9066381 0.5926633 0.9711392 1.0000000
## Brady Chiesa Magny
## 0.2371860 0.7514934 0.2371860
##
## $value
## [1] 3.168371
##
## $options
## $options$bmat
## [1] "I"
##
## $options$n
## [1] 11
##
## $options$which
## [1] "LA"
##
## $options$nev
## [1] 1
##
## $options$tol
## [1] 0
##
## $options$ncv
## [1] 0
##
## $options$ldv
## [1] 0
##
## $options$ishift
## [1] 1
##
## $options$maxiter
## [1] 1000
##
## $options$nb
## [1] 1
##
## $options$mode
## [1] 1
##
## $options$start
## [1] 1
##
## $options$sigma
## [1] 0
##
## $options$sigmai
## [1] 0
##
## $options$info
## [1] 0
##
## $options$iter
## [1] 11
##
## $options$nconv
## [1] 1
##
## $options$numop
## [1] 35
##
## $options$numopb
## [1] 0
##
## $options$numreo
## [1] 24
# Nodes weighted by eigen values
set.seed(7)
plot(fighter_wins, edge.width=(E(fighter_wins)$Weight), edge.arrow.size = .6,
edge.color=Ecolrs, vertex.label=nodes2$Rank, vertex.color=Vcolrs,
vertex.label.color=Vlabelcolrs, vertex.size = top12ev$vector*35,
sub="WEIGHTED BY EIGEN VALUES")
legend (x=-2.5, y=0, c("Champion", "Contender"), pch=21,
col="#777777", pt.bg=Vcolrs, pt.cex=1.25, cex=.8,
bty="o", ncol=1)
mtext("Top 12 UFC Welterweight Victory Network",
side = 3, line = -1.8, outer = TRUE, cex = 1.8)
Let’s compile the data frames that have been created so far, and add the tie-breaking criteria to the data frame as well.
# Additional items for the data frame
totalnum_ufcfights <- nodes2$Num_Fights_UFC
totalnum_ufcwins <- nodes2$UFC_W
finishes <- data.frame(nodes2$KO.TKO, nodes2$SUB)
decisions <- nodes2$DEC
dat <- data.frame(degdat, totalnum_ufcfights, totalnum_ufcwins, finishes, decisions, top12ev)
dat
From this data frame, the final rankings can be created.
And here is the final rankings compared side-by-side with the official UFC rankings.
My calculated rankings seem to be quite dissimilar from the official rankings. This by itself may not be indicative that my rankings should be dismissed, but as a fan of the sport I can see that I have missed the mark a bit. For example, Stephen Thompson is ranked very high despite being on a two-fight losing streak to and not having as good of a winning percentage as the other fighters. Also, Sean Brady is a new fighter to the scene and only has one victory within the top 12 yet is ranked at a position that is likely much higher than where he should be. From these examples I can surmise that my ranking system definitely has bias towards undefeated (yet unproven) fighters, and that the utilization of undirected eigenvector centrality is likely skewing the results as well. Ultimately, they are interesting insights to observe within this small network but not indicative of who are truly the best fighters at this weight class (aside from Kamaru Usman). Some additional avenues to pursue would be observing out-eigenvector centrality and how it influences the rankings as well as expanding upon the list of fighters within the network.