Project 2

Project 2

UFC statistics

UFC Conor McGregor by wallpaper.dog

Introduction

The topic of this project is analyzing the striking capabilities of UFC fighters based on various physical and performance metrics. The dataset, obtained from Kaggle and contains career statistics for 1673 UFC fighters. The dataset includes variables such as fighter’s name, height, weight, reach, stance, birth year, and several performance metrics like strikes landed per minute (SLpM), striking accuracy percentage (Str_Acc), and takedown averages. The aim of this project is to categorize fighters into their respective weight classes and analyze the striking capabilities across these classes using linear regression and visualizations.

Reason for Choosing the Dataset

As a fan of UFC and MMA, this dataset provides a fascinating opportunity to dive into the statistical side of the sport. By analyzing the data, we can uncover patterns and build a better understanding of the sport.

Loading Libraries and Dataset

# libraries  
library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.3.3
library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
library(highcharter)
Warning: package 'highcharter' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
# dataset
ufc_stats <- read_csv("UFC_stats (1).csv")
New names:
Rows: 1673 Columns: 15
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(2): fighter_name, Stance dbl (13): ...1, Height, Weight, Reach, Birthyear,
SLpM, Str_Acc, SApM, Str_D...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`

Data Cleaning and Exploration

Mutate Weight Class

# Explore the dataset
head(ufc_stats)
# A tibble: 6 × 15
   ...1 fighter_name    Height Weight Reach Stance Birthyear  SLpM Str_Acc  SApM
  <dbl> <chr>            <dbl>  <dbl> <dbl> <chr>      <dbl> <dbl>   <dbl> <dbl>
1     1 Shamil Abdurak…     75    235    76 Ortho…      1981  2.45      44  2.45
2     2 Daichi Abe          71    170    71 Ortho…      1991  3.8       33  4.49
3     3 Klidson Abreu       72    205    74 Ortho…      1992  2.05      40  2.9 
4     4 Juan Adams          77    265    80 Ortho…      1992  7.09      55  4.06
5     5 Anthony Adams       73    185    76 Ortho…      1988  3.17      41  5.93
6     6 Israel Adesanya     76    185    80 Switch      1989  3.95      49  2.63
# ℹ 5 more variables: Str_Def <dbl>, TD_Avg <dbl>, TD_Acc <dbl>, TD_Def <dbl>,
#   Sub_Avg <dbl>
summary(ufc_stats)
      ...1      fighter_name           Height          Weight     
 Min.   :   1   Length:1673        Min.   :60.00   Min.   :115.0  
 1st Qu.: 419   Class :character   1st Qu.:67.75   1st Qu.:135.0  
 Median : 837   Mode  :character   Median :70.00   Median :155.0  
 Mean   : 837                      Mean   :70.07   Mean   :164.6  
 3rd Qu.:1255                      3rd Qu.:73.00   3rd Qu.:185.0  
 Max.   :1673                      Max.   :83.00   Max.   :265.0  
                                   NA's   :1                      
     Reach          Stance            Birthyear         SLpM       
 Min.   :58.00   Length:1673        Min.   :1963   Min.   : 0.130  
 1st Qu.:69.00   Class :character   1st Qu.:1982   1st Qu.: 2.280  
 Median :72.00   Mode  :character   Median :1987   Median : 3.150  
 Mean   :71.82                      Mean   :1986   Mean   : 3.316  
 3rd Qu.:75.00                      3rd Qu.:1990   3rd Qu.: 4.070  
 Max.   :84.00                      Max.   :1999   Max.   :19.910  
                                    NA's   :3                      
    Str_Acc          SApM           Str_Def          TD_Avg      
 Min.   : 8.0   Min.   : 0.100   Min.   : 4.00   Min.   : 0.000  
 1st Qu.:38.0   1st Qu.: 2.480   1st Qu.:48.00   1st Qu.: 0.440  
 Median :44.0   Median : 3.200   Median :55.00   Median : 1.180  
 Mean   :44.3   Mean   : 3.506   Mean   :53.75   Mean   : 1.548  
 3rd Qu.:50.0   3rd Qu.: 4.150   3rd Qu.:60.00   3rd Qu.: 2.300  
 Max.   :88.0   Max.   :21.180   Max.   :86.00   Max.   :18.000  
                                                                 
     TD_Acc           TD_Def          Sub_Avg       
 Min.   :  0.00   Min.   :  0.00   Min.   : 0.0000  
 1st Qu.: 20.00   1st Qu.: 39.00   1st Qu.: 0.0000  
 Median : 36.00   Median : 60.00   Median : 0.4000  
 Mean   : 35.82   Mean   : 54.95   Mean   : 0.6598  
 3rd Qu.: 50.00   3rd Qu.: 75.00   3rd Qu.: 0.9000  
 Max.   :100.00   Max.   :100.00   Max.   :20.4000  
                                                    
str(ufc_stats)
spc_tbl_ [1,673 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ ...1        : num [1:1673] 1 2 3 4 5 6 7 8 9 10 ...
 $ fighter_name: chr [1:1673] "Shamil Abdurakhimov" "Daichi Abe" "Klidson Abreu" "Juan Adams" ...
 $ Height      : num [1:1673] 75 71 72 77 73 76 65 66 63 67 ...
 $ Weight      : num [1:1673] 235 170 205 265 185 185 125 125 115 155 ...
 $ Reach       : num [1:1673] 76 71 74 80 76 80 65 68 63 73 ...
 $ Stance      : chr [1:1673] "Orthodox" "Orthodox" "Orthodox" "Orthodox" ...
 $ Birthyear   : num [1:1673] 1981 1991 1992 1992 1988 ...
 $ SLpM        : num [1:1673] 2.45 3.8 2.05 7.09 3.17 3.95 1.93 3.7 4.93 3.69 ...
 $ Str_Acc     : num [1:1673] 44 33 40 55 41 49 23 50 50 37 ...
 $ SApM        : num [1:1673] 2.45 4.49 2.9 4.06 5.93 2.63 3.35 4.15 7.19 4.47 ...
 $ Str_Def     : num [1:1673] 58 56 55 34 44 61 60 47 53 53 ...
 $ TD_Avg      : num [1:1673] 1.23 0.33 0.64 0.91 0 0 0 1.23 0.94 0.19 ...
 $ TD_Acc      : num [1:1673] 24 50 20 66 0 0 0 66 25 25 ...
 $ TD_Def      : num [1:1673] 47 0 80 57 0 82 0 33 50 87 ...
 $ Sub_Avg     : num [1:1673] 0.2 0 0 0 0 0.3 0 0.6 0.2 0 ...
 - attr(*, "spec")=
  .. cols(
  ..   ...1 = col_double(),
  ..   fighter_name = col_character(),
  ..   Height = col_double(),
  ..   Weight = col_double(),
  ..   Reach = col_double(),
  ..   Stance = col_character(),
  ..   Birthyear = col_double(),
  ..   SLpM = col_double(),
  ..   Str_Acc = col_double(),
  ..   SApM = col_double(),
  ..   Str_Def = col_double(),
  ..   TD_Avg = col_double(),
  ..   TD_Acc = col_double(),
  ..   TD_Def = col_double(),
  ..   Sub_Avg = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 
# Mutate to create weight class variable
ufc_stats <- ufc_stats %>%
  mutate(weight_class = case_when(
    Weight <= 115 ~ "Strawweight",
    Weight <= 125 ~ "Flyweight",
    Weight <= 135 ~ "Bantamweight",
    Weight <= 145 ~ "Featherweight",
    Weight <= 170 ~ "Welterweight",
    Weight <= 185 ~ "Middleweight",
    Weight <= 205 ~ "Light Heavyweight",
    Weight <= 265 ~ "Heavyweight",
    TRUE ~ "Above Heavyweight"
  ))

Simple Graphs for Exploration

# Plot height vs. weight by weight class with regression line
ggplot(ufc_stats, aes(x = Height, y = Weight, color = weight_class)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Height vs. Weight by Weight Class", x = "Height (inches)", y = "Weight (lbs)") +
  theme_minimal() +
  scale_color_manual(values = c("Strawweight" = "blue", "Flyweight" = "red", "Bantamweight" = "green", "Featherweight" = "purple", "Welterweight" = "orange", "Middleweight" = "pink", "Light Heavyweight" = "yellow", "Heavyweight" = "brown", "Above Heavyweight" = "grey"))
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).

# Plot strikes landed per minute vs. striking accuracy by weight class with regression line
ggplot(ufc_stats, aes(x = SLpM, y = Str_Acc, color = weight_class)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Strikes Landed per Minute vs. Striking Accuracy by Weight Class", x = "Strikes Landed per Minute", y = "Striking Accuracy (%)") +
  theme_classic() +
  scale_color_brewer(palette = "Set1")
`geom_smooth()` using formula = 'y ~ x'

# Plot submission average vs. takedown average by weight class with regression line
ggplot(ufc_stats, aes(x = TD_Avg, y = Sub_Avg, color = weight_class)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Submission Average vs. Takedown Average by Weight Class", x = "Takedown Average", y = "Submission Average") +
  theme_bw() +
  scale_color_viridis_d()
`geom_smooth()` using formula = 'y ~ x'

Linear Regression Analysis

# Perform linear regression: SLpM ~ Str_Acc + Weight + Height
model <- lm(SLpM ~ Str_Acc + Weight + Height, data = ufc_stats)
summary(model)

Call:
lm(formula = SLpM ~ Str_Acc + Weight + Height, data = ufc_stats)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.0343 -0.9166 -0.1311  0.7524 14.6039 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.141146   0.952441   0.148  0.88221    
Str_Acc      0.057289   0.003734  15.341  < 2e-16 ***
Weight      -0.005411   0.001685  -3.211  0.00135 ** 
Height       0.021815   0.016423   1.328  0.18425    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.446 on 1668 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.1256,    Adjusted R-squared:  0.1241 
F-statistic: 79.89 on 3 and 1668 DF,  p-value: < 2.2e-16
# Diagnostic plots
par(mfrow = c(2, 2))

model

Call:
lm(formula = SLpM ~ Str_Acc + Weight + Height, data = ufc_stats)

Coefficients:
(Intercept)      Str_Acc       Weight       Height  
   0.141146     0.057289    -0.005411     0.021815  

The model indicates that striking accuracy (Str_Acc) is a significant predictor of the number of strikes landed per minute (SLpM), with a positive coefficient. Weight also has a significant but negative effect on SLpM. Height does not significantly predict SLpM in this model.

Interactive Visualization

# Interactive plot using plotly
p <- ggplot(ufc_stats, aes(x = Str_Acc, y = SLpM, color = weight_class)) +
  geom_point() +
  labs(title = "Interactive Plot: Strikes Landed per Minute vs. Striking Accuracy by Weight Class", x = "Striking Accuracy (%)", y = "Strikes Landed per Minute") +
  theme_light() +
  scale_color_viridis_d()

ggplotly(p)
# Interactive plot using highcharter because it looks nicer
hc <- hchart(ufc_stats, "scatter", hcaes(x = Str_Acc, y = SLpM, group = weight_class)) %>%
  hc_title(text = "Interactive Plot: Strikes Landed per Minute vs. Striking Accuracy by Weight Class") %>%
  hc_xAxis(title = list(text = "Striking Accuracy (%)")) %>%
  hc_yAxis(title = list(text = "Strikes Landed per Minute")) %>%
  hc_add_theme(hc_theme_flat())

hc

Graph on stance

# Prepare data for alluvial plot
ufc_stats_alluvial <- ufc_stats %>%
  group_by(weight_class, Stance) %>%
  summarize(count = n()) %>%
  ungroup()
`summarise()` has grouped output by 'weight_class'. You can override using the
`.groups` argument.
# Create alluvial plot using highcharter
hcalluvial <- hchart(ufc_stats_alluvial, type = "sankey", hcaes(from = weight_class, to = Stance, weight = count)) %>%
  hc_title(text = "Alluvial Plot: Weight Class vs. Stance") %>%
  hc_add_theme(hc_theme_darkunica())

hcalluvial

Background

Mixed martial arts (MMA), particularly UFC, is a fast-growing sport that combines various fighting disciplines. Understanding the performance and abilities of the fighters can offer many insights into their fighting styles and effectiveness in different weight classes. Analyzing these stats can provide valuable information for fighters and coaches, helping them improve training and strategies.

Visualization Analysis

The visualizations created, including height vs. weight and strikes landed per minute vs. striking accuracy, provide insights into the distribution and performance of fighters across different weight classes. The linear regression analysis suggests that striking accuracy and physical attributes like weight and height significantly impact the number of strikes landed per minute. The interactive plots enhances the exploration of these relationships, making it easier to identify patterns and trends.

Conculsion

This project has successfully categorized UFC fighters into their respective weight classes and analyzed their striking capabilities using various statistical and visualization techniques. The findings provide valuable information on the performances of fighters and their success, highlighting the importance of striking accuracy and physical attributes. Future analyses could explore additional variables or include more advanced statistical techniques to further understand the factors influencing fighter performance.