I chose this data set because over the last few years I’ve been watching a good amount of MMA (Mixed Martial Arts), and was interested to see how many fighters preferred a specific stance, or if their arm length (reach) affected their ability to strike their opponent. In the ufc_stats data set I had a multitude of numerical values that I could use in order to reach my assumption about fighters reach in correlation to their strike accuracy. I first renamed all the numerical categories to be more legible, and also be understood by myself and the potential viewer of the categories. I then cleaned NAs in categories that I was filtering for and then I identified the unique stances in the data set.
Within the data set I looked at the four stances fighters used; Orthodox, Southpaw, Open Stance and Switch. Orthodox stances are traditionally the most frequent because it’s a right hand dominant stance, Southpaw is less frequent, but still abundant and it’s just left hand dominant, which gives Orthodox fighters a bit of a challenge due to the mix up in mirroring. Switch and Open stances are way less frequent, but essentially Switch stance is the combination of Orthodox and Southpaw and Open stance is a looser approach with more fluidity.
In the linear regression visuals and the diagnostic plots I found it interesting that arm reach wasn’t really a major factor in fighters striking accuracy. Although reach plays an important factor in a practical manner (space creation and jabbing) it was interesting to see that it had a very small positive correlation to striking accuracy. The other visuals I created was a linear regression scatter plot, with a standard error line, the second graph was a jitter plot showing fighters accuracy in the four stances, and the final chart I filtered for the top 15 PFP (Pound for Pound) fighters in the world and their striking accuracy and stance.
Introduction
A fighters success can be explained through a variety of physical and mental characteristics. The data set I will be using covers 13 numerical values that varies from fighters Height, Weight, and Reach, to their striking accuracy and significant strikes landed per minute. In the UFC stats data set, sourced from the UFC stats webpage, I will be exploring a few of the physical characteristics in relation to a fighters success (ability to successfully strike targets, based on arms span), and then take a look at the top 15 PFP (Pound for Pound) Men fighters in the UFC (Ultimate Fighting Championship).
Metrics:
Reach: Horizontal Distance from finger tip to finger tip.
Strike Accuracy: Succession of strikes on target.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
New names:
Rows: 1673 Columns: 15
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(2): fighter_name, Stance dbl (13): ...1, Height, Weight, Reach, Birthyear,
SLpM, Str_Acc, SApM, Str_D...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
#Removing Missing Variables (NAs) and Identify Unique Fighter Stancesclean_ufc_stats <- ufc_stats |>filter(!is.na(Stance), !is.na(`Significant Strikes Landed per Min`), !is.na(`Strike Accuracy`))unique(clean_ufc_stats$Stance)
[1] "Orthodox" "Switch" "Southpaw" "Open Stance"
#Linear Regression to see correlation between Fighters Reach and Strike Accuracylm_ufc_stats <-lm(`Strike Accuracy`~ Reach, data = clean_ufc_stats)summary(lm_ufc_stats)
Call:
lm(formula = `Strike Accuracy` ~ Reach, data = clean_ufc_stats)
Residuals:
Min 1Q Median 3Q Max
-36.961 -5.961 -0.550 5.233 42.099
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 21.76501 4.01146 5.426 6.63e-08 ***
Reach 0.31346 0.05575 5.623 2.20e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.465 on 1652 degrees of freedom
Multiple R-squared: 0.01878, Adjusted R-squared: 0.01819
F-statistic: 31.62 on 1 and 1652 DF, p-value: 2.201e-08
The regression model equation is Strike Accuracy: 21.77 + 0.31 (Reach). The p-value of the model is 2.20e-08 or 0.000000022, which is below the 0.05 threshold making it statistically significant. That being said, the R-squared value is 0.0188 or 1.88% which is a small variance in the Striking Accuracy.
# Linear Regression Modelggplot(lm_ufc_stats, aes(x = Reach, y =`Strike Accuracy`)) +geom_point(alpha =0.6, color ="#0FA8E2") +geom_smooth(method ="lm", se =TRUE, color ="#F10000", size =0.8, linetype ="dashed") +labs(title ="Correlation of Fighters Reach to Strike Accuracy (%)", x ="Reach (Inches)", y ="Strike Accuracy (%)", caption ="Source: ufcstats.com") +theme_bw(base_size =8, base_family ="Georgia") +theme(plot.title =element_text(size =14, face ="bold", hjust =0.5), axis.title =element_text(size =10), axis.text =element_text(size =10), plot.caption =element_text(size =8))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
`geom_smooth()` using formula = 'y ~ x'
# Jitter Plot that represents Stance and Strike Accuracy ggplot(clean_ufc_stats, aes(x = Stance, y =`Strike Accuracy`, color = Stance)) +geom_jitter(width =0.25, alpha =0.5, size =2) +labs(title ="Strike Accuracy of 4 Fight Stances", x ="Fight Stance", y ="Strike Accuracy (%)", caption ="ufcstats.com", color ="Fight Stance") +theme_bw(base_size =8, base_family ="Georgia") +theme(plot.title =element_text(size =14, face ="bold", hjust =0.5), axis.title =element_text(size =10), axis.text =element_text(size =10), plot.caption =element_text(size =8))
#Filter by the 15 PFP Fighters in the UFCmen_pfp <- clean_ufc_stats |>filter(`Fighter Name`%in%c("Islam Makhachev", "Jon Jones", "Ilia Topuria", "Merab Dvalishvili", "Dricus Du Plessis", "Magomed Ankalaev", "Belal Muhammad", "Carlos Ulberg", "Alexander Volkanovski", "Alexandre Pantoja", "Tom Aspinall", "Max Holloway", "Sean O'Malley", "Charles Oliveira", "Arman Tsarukyan"))unique(men_pfp$`Fighter Name`)