Karim M Project 2

Essay

  1. I chose this data set because over the last few years I’ve been watching a good amount of MMA (Mixed Martial Arts), and was interested to see how many fighters preferred a specific stance, or if their arm length (reach) affected their ability to strike their opponent. In the ufc_stats data set I had a multitude of numerical values that I could use in order to reach my assumption about fighters reach in correlation to their strike accuracy. I first renamed all the numerical categories to be more legible, and also be understood by myself and the potential viewer of the categories. I then cleaned NAs in categories that I was filtering for and then I identified the unique stances in the data set.

  2. Within the data set I looked at the four stances fighters used; Orthodox, Southpaw, Open Stance and Switch. Orthodox stances are traditionally the most frequent because it’s a right hand dominant stance, Southpaw is less frequent, but still abundant and it’s just left hand dominant, which gives Orthodox fighters a bit of a challenge due to the mix up in mirroring. Switch and Open stances are way less frequent, but essentially Switch stance is the combination of Orthodox and Southpaw and Open stance is a looser approach with more fluidity.

  3. In the linear regression visuals and the diagnostic plots I found it interesting that arm reach wasn’t really a major factor in fighters striking accuracy. Although reach plays an important factor in a practical manner (space creation and jabbing) it was interesting to see that it had a very small positive correlation to striking accuracy. The other visuals I created was a linear regression scatter plot, with a standard error line, the second graph was a jitter plot showing fighters accuracy in the four stances, and the final chart I filtered for the top 15 PFP (Pound for Pound) fighters in the world and their striking accuracy and stance.

Introduction

A fighters success can be explained through a variety of physical and mental characteristics. The data set I will be using covers 13 numerical values that varies from fighters Height, Weight, and Reach, to their striking accuracy and significant strikes landed per minute. In the UFC stats data set, sourced from the UFC stats webpage, I will be exploring a few of the physical characteristics in relation to a fighters success (ability to successfully strike targets, based on arms span), and then take a look at the top 15 PFP (Pound for Pound) Men fighters in the UFC (Ultimate Fighting Championship).

Metrics:

  • Reach: Horizontal Distance from finger tip to finger tip.

  • Strike Accuracy: Succession of strikes on target.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggfortify)
setwd("~/Desktop/Data Science MC/Data Science 110")
ufc_stats <- read_csv("ufc_stats.csv")
New names:
Rows: 1673 Columns: 15
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(2): fighter_name, Stance dbl (13): ...1, Height, Weight, Reach, Birthyear,
SLpM, Str_Acc, SApM, Str_D...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
head(ufc_stats)
# A tibble: 6 × 15
   ...1 fighter_name    Height Weight Reach Stance Birthyear  SLpM Str_Acc  SApM
  <dbl> <chr>            <dbl>  <dbl> <dbl> <chr>      <dbl> <dbl>   <dbl> <dbl>
1     1 Shamil Abdurak…     75    235    76 Ortho…      1981  2.45      44  2.45
2     2 Daichi Abe          71    170    71 Ortho…      1991  3.8       33  4.49
3     3 Klidson Abreu       72    205    74 Ortho…      1992  2.05      40  2.9 
4     4 Juan Adams          77    265    80 Ortho…      1992  7.09      55  4.06
5     5 Anthony Adams       73    185    76 Ortho…      1988  3.17      41  5.93
6     6 Israel Adesanya     76    185    80 Switch      1989  3.95      49  2.63
# ℹ 5 more variables: Str_Def <dbl>, TD_Avg <dbl>, TD_Acc <dbl>, TD_Def <dbl>,
#   Sub_Avg <dbl>
# Renaming variables
ufc_stats <- ufc_stats |>
  rename(`Fighter Name` = fighter_name,`Significant Strikes Landed per Min` = SLpM, `Strike Accuracy` = Str_Acc, `Significant Strikes Absorbed per Min` = SApM, `Strike Defense` = Str_Def, `Takedown Average` = TD_Avg, `Takedown Accuracy` = TD_Acc, `Takedown Defense` = TD_Def, `Submission Average` = Sub_Avg)
#Removing Missing Variables (NAs) and Identify Unique Fighter Stances
clean_ufc_stats <- ufc_stats |>
  filter(!is.na(Stance), !is.na(`Significant Strikes Landed per Min`), !is.na(`Strike Accuracy`))

unique(clean_ufc_stats$Stance)
[1] "Orthodox"    "Switch"      "Southpaw"    "Open Stance"
#Linear Regression to see correlation between Fighters Reach and Strike Accuracy
lm_ufc_stats <- lm(`Strike Accuracy` ~ Reach, data = clean_ufc_stats)

summary(lm_ufc_stats)

Call:
lm(formula = `Strike Accuracy` ~ Reach, data = clean_ufc_stats)

Residuals:
    Min      1Q  Median      3Q     Max 
-36.961  -5.961  -0.550   5.233  42.099 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 21.76501    4.01146   5.426 6.63e-08 ***
Reach        0.31346    0.05575   5.623 2.20e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.465 on 1652 degrees of freedom
Multiple R-squared:  0.01878,   Adjusted R-squared:  0.01819 
F-statistic: 31.62 on 1 and 1652 DF,  p-value: 2.201e-08
# Diagnostic Plots
autoplot(lm_ufc_stats, 1:4, nrow = 2, ncol = 2)

Linear Regression Analysis

The regression model equation is Strike Accuracy: 21.77 + 0.31 (Reach). The p-value of the model is 2.20e-08 or 0.000000022, which is below the 0.05 threshold making it statistically significant. That being said, the R-squared value is 0.0188 or 1.88% which is a small variance in the Striking Accuracy.

# Linear Regression Model
ggplot(lm_ufc_stats, aes(x = Reach, y = `Strike Accuracy`)) +
  geom_point(alpha = 0.6, color = "#0FA8E2") +
  geom_smooth(method = "lm", se = TRUE, color = "#F10000", size = 0.8, linetype = "dashed") +
  labs(title = "Correlation of Fighters Reach to Strike Accuracy (%)", x = "Reach (Inches)", y = "Strike Accuracy (%)", caption = "Source: ufcstats.com") +
  theme_bw(base_size = 8, base_family = "Georgia") + 
   theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5), axis.title = element_text(size = 10), axis.text = element_text(size = 10), plot.caption = element_text(size = 8))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
`geom_smooth()` using formula = 'y ~ x'

# Jitter Plot that represents Stance and Strike Accuracy 
ggplot(clean_ufc_stats, aes(x = Stance, y = `Strike Accuracy`, color = Stance)) +
  geom_jitter(width = 0.25, alpha = 0.5, size = 2) + 
  labs(title = "Strike Accuracy of 4 Fight Stances", x = "Fight Stance", y = "Strike Accuracy (%)", caption = "ufcstats.com", color = "Fight Stance") +
  theme_bw(base_size = 8, base_family = "Georgia") +
  theme(plot.title = element_text(size = 14, face = "bold", hjust = 0.5), axis.title = element_text(size = 10), axis.text = element_text(size = 10), plot.caption = element_text(size = 8))

#Filter by the 15 PFP Fighters in the UFC
men_pfp <- clean_ufc_stats |>
  filter(`Fighter Name` %in% c("Islam Makhachev", "Jon Jones", "Ilia Topuria", "Merab Dvalishvili", "Dricus Du Plessis", "Magomed Ankalaev", "Belal Muhammad", "Carlos Ulberg", "Alexander Volkanovski", "Alexandre Pantoja", "Tom Aspinall", "Max Holloway", "Sean O'Malley", "Charles Oliveira", "Arman Tsarukyan"))


unique(men_pfp$`Fighter Name`)
 [1] "Magomed Ankalaev"      "Tom Aspinall"          "Dricus Du Plessis"    
 [4] "Merab Dvalishvili"     "Max Holloway"          "Jon Jones"            
 [7] "Islam Makhachev"       "Belal Muhammad"        "Sean O'Malley"        
[10] "Charles Oliveira"      "Alexandre Pantoja"     "Ilia Topuria"         
[13] "Arman Tsarukyan"       "Carlos Ulberg"         "Alexander Volkanovski"
#Graph for Strike Accuracy and Stance
ggplot(men_pfp, aes(x = `Fighter Name`, y = `Strike Accuracy`, fill = Stance)) +
  geom_col(color = "black") +
  labs(title = "Strike Accuracy of PFP UFC Fighters", x = "PFP Fighters", y = "Strike Accuracy (%)", caption = "Source: ufcstats.com", fill = "Stance") +
  theme_bw(base_size = 8, base_family = "Georgia") +
  theme(plot.title = element_text(face = "bold", hjust = 0.5), axis.text.x = element_text(angle = 45, hjust = 1), axis.text = element_text(size = 9), legend.position = "right")

Sources

  • http://www.ufcstats.com/statistics/fighters

  • https://www.ufc.com/rankings

  • https://mmaexplained.com/articles/manual-to-mma-stances/