Exploratory Data Analysis

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readxl)
library(tidyr) 

dataset <- read_excel("~/Downloads/UFC_Dataset.xls")

# Filter dataset to include only male fights and remove catchweights
filtered_dataset <- dataset |>
  filter(Gender == "MALE" & WeightClass != "Catch Weight")

# Create a new column to indicate if Blue corner won (1 if Blue won)
filtered_dataset <- filtered_dataset |>
  mutate(BlueWins = ifelse(Winner == "Blue", 1, 0)) |>
  mutate(WinnerLabel = ifelse(BlueWins == 1, "Winner", "Loser")) 

# Counts for the fights in each weight class
weight_class_counts <- filtered_dataset |>
  group_by(WeightClass) |>
  summarise(Count = n()) |>
  arrange(desc(Count))

print(weight_class_counts)
## # A tibble: 8 × 2
##   WeightClass       Count
##   <chr>             <int>
## 1 Lightweight        1011
## 2 Welterweight        965
## 3 Middleweight        729
## 4 Featherweight       694
## 5 Bantamweight        617
## 6 Light Heavyweight   471
## 7 Heavyweight         462
## 8 Flyweight           309
# Define key factors
key_factors <- c("HeightDif", "ReachDif", "AgeDif", "SigStrDif", "AvgSubAttDif", "AvgTDDif")

# Create a summary table for Key Factors
summary_df <- filtered_dataset |>
  select(all_of(key_factors)) |>
  summarise(across(everything(), list(
    Min = ~min(., na.rm = TRUE),
    `1st_Qu.` = ~quantile(., 0.25, na.rm = TRUE),
    Median = ~median(., na.rm = TRUE),
    Mean = ~mean(., na.rm = TRUE),
    `3rd_Qu.` = ~quantile(., 0.75, na.rm = TRUE),
    Max = ~max(., na.rm = TRUE)
  ))) |>
  pivot_longer(cols = everything(), names_to = c("Factor", "Statistic"), names_sep = "_") |>
  pivot_wider(names_from = Statistic, values_from = value)
## Warning: Expected 2 pieces. Additional pieces discarded in 12 rows [2, 5, 8, 11, 14, 17,
## 20, 23, 26, 29, 32, 35].
print(summary_df)
## # A tibble: 6 × 7
##   Factor          Min   `1st` Median    Mean `3rd`   Max
##   <chr>         <dbl>   <dbl>  <dbl>   <dbl> <dbl> <dbl>
## 1 HeightDif    -188.   -5.08   0      0.0429 5.08   30.5
## 2 ReachDif     -188.   -5.08   0     -0.254  5.08   30.5
## 3 AgeDif        -17    -3      0      0.220  4      17  
## 4 SigStrDif    -113   -10.8   -0.454 -2.92   2.75  128. 
## 5 AvgSubAttDif   -8.3  -0.431  0     -0.0692 0.25    7.8
## 6 AvgTDDif      -11    -1      0     -0.172  0.667  10.9
# -------------------------------
# Graphs: Boxplots and Density Plots
# -------------------------------
factors <- c("HeightDif", "ReachDif", "AgeDif", "SigStrDif", "AvgSubAttDif", "AvgTDDif")

# Boxplots
for (factor in factors) {
  p <- ggplot(filtered_dataset, aes(x = WinnerLabel, y = .data[[factor]], fill = WinnerLabel)) +
    geom_boxplot() +
    labs(title = paste("Comparison of", factor, "between Winners and Losers"),
         x = "Fight Outcome",
         y = factor) +
    theme_minimal() +
    theme(legend.position = "none") +
    scale_fill_manual(values = c("Winner" = "#56B4E9", "Loser" = "#E69F00"))
  
  print(p)
}

# Density Plots
for (factor in factors) {
  p <- ggplot(filtered_dataset, aes(x = .data[[factor]], fill = WinnerLabel)) +
    geom_density(alpha = 0.6) +
    labs(title = paste("Density Plot of", factor, "for Winners and Losers"),
         x = factor,
         y = "Density") +
    theme_minimal() +
    scale_fill_manual(values = c("Winner" = "#56B4E9", "Loser" = "#E69F00"))
  
  print(p)
}

Counts:

Lightweight and Welterweight have the most fights, which makes sense since it is these weights that are most representative of the general population, thus more fighters fight at these weights resulting in a larger total number of fights. Flyweight is at the bottom with the least number of fights, this is likely due to a multitude of variables. Combat sports has always viewed heavier and larger divisions with more favor as they garner more attention from the public, thus the lighter weight classes are often over looked despite being very skilled and entertaining.

Summary Stats:

The summary table shown summarizes each of the 6 key factors our analysis focuses on. These are differentials calculated by subtracting the red fighter’s metric from the blue fighters metric. These numbers show the overall statistics and are not filtered for a certain weight-class or by winner/loser. These numbers alone don’t do much except help us understand what the differentials typically look like from fight to fight. However, without filtering for weight-class, these numbers don’t hold too much meaning.

Graphs:

From the box-plots we can surmise that there isn’t a visible difference between these statistics in regards to who is winning and losing a fight. Age and take downs seem to have a slight disparity but not one that is significant enough to be actionable. The density plots show more of the same, it is hard to derive any significant insights from these counts, statistics, and graphs alone. The one metric here that seems to be significant is strikes landed. In both the graphs we can see that winners are usually positive in this aspect whereas the losers are mostly negative, hard to see from the graphs, but there nonetheless.

Insights and Application

The graphs and numbers gathered above communicate that the majority of the factors are insignificant in regards to its affects to help one win a fight. It was here that I realized I was missing something very important. In the beginning, I was looking at the data as a whole, however, the fight game has always been divided into weight classes. Since we weren’t able to derive significance from evaluating the dataset as a whole we needed to improve the rigor of our analysis and get more specific.

Hypothesis Testing

-Logic: The goal is to determine what key factors, if any, influence victory in the UFC the most at each weight class.

-Key Factors: Significant Strikes Landed, Average Submissions Attempted, Average Take downs Landed, Height, Reach, and Age.

Hypothesis: Are there statistically significant differences in the key factors between winners and losers of a fight, if so, how do these key factors directly influence winning?

Null: There is no statistically significant differences in the key factors between winners and losers of a fight, thus none of the factors directly influence victory. n

T-Tests and Analysis

We will use grouping functions to group the data by weight-class and separate then winner and the loser. After that, we will use T-Tests to determine if there is a difference in the mean between the winner and the loser in regards to our 6 key factors. This will be done for each weight-class, so there will be 6 T-Tests for each weight class. We will interpret them individually, any factors found to be significantly different will be further analyzed by a GLM to understand how it influences winning.

# -------------------------------
# T-tests: Comparing Factors by Weight Class
# -------------------------------
# Group the data by WeightClass and calculate summary statistics
summary_stats_by_weightclass <- filtered_dataset |>
  group_by(WeightClass, BlueWins) |>
  summarise(
    AvgHeightDif = mean(HeightDif, na.rm = TRUE),
    AvgReachDif = mean(ReachDif, na.rm = TRUE),
    AvgAgeDif = mean(AgeDif, na.rm = TRUE),
    AvgSigStrDif = mean(SigStrDif, na.rm = TRUE),
    AvgSubAttDif = mean(AvgSubAttDif, na.rm = TRUE),
    AvgTDDif = mean(AvgTDDif, na.rm = TRUE)
  ) |>
  arrange(WeightClass, BlueWins)
## `summarise()` has grouped output by 'WeightClass'. You can override using the
## `.groups` argument.
print(summary_stats_by_weightclass)
## # A tibble: 16 × 8
## # Groups:   WeightClass [8]
##    WeightClass       BlueWins AvgHeightDif AvgReachDif AvgAgeDif AvgSigStrDif
##    <chr>                <dbl>        <dbl>       <dbl>     <dbl>        <dbl>
##  1 Bantamweight             0       0.0137      0.301     0.0350       -5.05 
##  2 Bantamweight             1       0.165      -0.0164    0.642        -1.88 
##  3 Featherweight            0       0.192      -0.801    -0.478        -3.95 
##  4 Featherweight            1       0.928       0.903     0.889        -1.38 
##  5 Flyweight                0       0.896       0.0460    0.123        -5.11 
##  6 Flyweight                1       0.0208     -0.164     0.705        -2.43 
##  7 Heavyweight              0      -1.30       -2.59      0.0719       -3.33 
##  8 Heavyweight              1       0.552       1.50      0.304         0.410
##  9 Light Heavyweight        0      -0.859      -2.55     -0.823        -3.76 
## 10 Light Heavyweight        1       0.568       1.16      0.507        -2.25 
## 11 Lightweight              0       0.368      -0.0918   -0.169        -4.58 
## 12 Lightweight              1       0.340       0.250     0.577        -2.02 
## 13 Middleweight             0      -0.478      -1.08      0.144        -3.54 
## 14 Middleweight             1       0.219       0.211     0.594        -0.788
## 15 Welterweight             0      -0.589      -0.573     0.350        -2.61 
## 16 Welterweight             1       0.317       0.417     0.700        -1.70 
## # ℹ 2 more variables: AvgSubAttDif <dbl>, AvgTDDif <dbl>
# Function to perform t-tests for each weight class
perform_t_tests <- function(data, weight_class) {
  data_subset <- filter(data, WeightClass == weight_class)
  
  t_height <- t.test(HeightDif ~ BlueWins, data = data_subset)
  t_reach <- t.test(ReachDif ~ BlueWins, data = data_subset)
  t_age <- t.test(AgeDif ~ BlueWins, data = data_subset)
  t_sig_str <- t.test(SigStrDif ~ BlueWins, data = data_subset)
  t_sub_att <- t.test(AvgSubAttDif ~ BlueWins, data = data_subset)
  t_td <- t.test(AvgTDDif ~ BlueWins, data = data_subset)
  
  list(
    WeightClass = weight_class,
    T_Height = t_height,
    T_Reach = t_reach,
    T_Age = t_age,
    T_SigStr = t_sig_str,
    T_SubAtt = t_sub_att,
    T_TD = t_td
  )
}

# Get the unique weight classes and perform t-tests
weight_classes <- unique(filtered_dataset$WeightClass)
t_test_results <- lapply(weight_classes, perform_t_tests, data = filtered_dataset)

t_test_results
## [[1]]
## [[1]]$WeightClass
## [1] "Middleweight"
## 
## [[1]]$T_Height
## 
##  Welch Two Sample t-test
## 
## data:  HeightDif by BlueWins
## t = -1.5588, df = 700.58, p-value = 0.1195
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.5741279  0.1808228
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.4778218       0.2188308 
## 
## 
## [[1]]$T_Reach
## 
##  Welch Two Sample t-test
## 
## data:  ReachDif by BlueWins
## t = -2.0994, df = 701.93, p-value = 0.03614
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -2.49431543 -0.08350879
## sample estimates:
## mean in group 0 mean in group 1 
##      -1.0780198       0.2108923 
## 
## 
## [[1]]$T_Age
## 
##  Welch Two Sample t-test
## 
## data:  AgeDif by BlueWins
## t = -1.1686, df = 679.81, p-value = 0.243
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.2068191  0.3062555
## sample estimates:
## mean in group 0 mean in group 1 
##       0.1435644       0.5938462 
## 
## 
## [[1]]$T_SigStr
## 
##  Welch Two Sample t-test
## 
## data:  SigStrDif by BlueWins
## t = -2.0466, df = 726.51, p-value = 0.04105
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -5.3941349 -0.1121916
## sample estimates:
## mean in group 0 mean in group 1 
##      -3.5414676      -0.7883043 
## 
## 
## [[1]]$T_SubAtt
## 
##  Welch Two Sample t-test
## 
## data:  AvgSubAttDif by BlueWins
## t = -2.4101, df = 721.92, p-value = 0.0162
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2806018 -0.0286684
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.10657450      0.04806062 
## 
## 
## [[1]]$T_TD
## 
##  Welch Two Sample t-test
## 
## data:  AvgTDDif by BlueWins
## t = -2.7688, df = 718.48, p-value = 0.005771
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.56731943 -0.09657776
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.31608490      0.01586369 
## 
## 
## 
## [[2]]
## [[2]]$WeightClass
## [1] "Featherweight"
## 
## [[2]]$T_Height
## 
##  Welch Two Sample t-test
## 
## data:  HeightDif by BlueWins
## t = -1.4236, df = 596.62, p-value = 0.1551
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.7507711  0.2792804
## sample estimates:
## mean in group 0 mean in group 1 
##       0.1921513       0.9278967 
## 
## 
## [[2]]$T_Reach
## 
##  Welch Two Sample t-test
## 
## data:  ReachDif by BlueWins
## t = -2.3025, df = 689.1, p-value = 0.0216
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -3.156853 -0.250937
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.8010165       0.9028782 
## 
## 
## [[2]]$T_Age
## 
##  Welch Two Sample t-test
## 
## data:  AgeDif by BlueWins
## t = -3.8928, df = 601.68, p-value = 0.0001102
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -2.0564184 -0.6772621
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.4775414       0.8892989 
## 
## 
## [[2]]$T_SigStr
## 
##  Welch Two Sample t-test
## 
## data:  SigStrDif by BlueWins
## t = -1.5979, df = 579.59, p-value = 0.1106
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -5.7317918  0.5892445
## sample estimates:
## mean in group 0 mean in group 1 
##       -3.948546       -1.377272 
## 
## 
## [[2]]$T_SubAtt
## 
##  Welch Two Sample t-test
## 
## data:  AvgSubAttDif by BlueWins
## t = -0.6123, df = 637.82, p-value = 0.5406
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.18718871  0.09820122
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.07571773     -0.03122399 
## 
## 
## [[2]]$T_TD
## 
##  Welch Two Sample t-test
## 
## data:  AvgTDDif by BlueWins
## t = -0.87298, df = 550.99, p-value = 0.3831
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.3832405  0.1474067
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.19553168     -0.07761476 
## 
## 
## 
## [[3]]
## [[3]]$WeightClass
## [1] "Lightweight"
## 
## [[3]]$T_Height
## 
##  Welch Two Sample t-test
## 
## data:  HeightDif by BlueWins
## t = 0.070782, df = 928.69, p-value = 0.9436
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.7503935  0.8065478
## sample estimates:
## mean in group 0 mean in group 1 
##       0.3683642       0.3402871 
## 
## 
## [[3]]$T_Reach
## 
##  Welch Two Sample t-test
## 
## data:  ReachDif by BlueWins
## t = -0.67441, df = 897.86, p-value = 0.5002
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.3352749  0.6522882
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.09180438      0.24968900 
## 
## 
## [[3]]$T_Age
## 
##  Welch Two Sample t-test
## 
## data:  AgeDif by BlueWins
## t = -2.3592, df = 852.96, p-value = 0.01854
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.365155 -0.125223
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.1686341       0.5765550 
## 
## 
## [[3]]$T_SigStr
## 
##  Welch Two Sample t-test
## 
## data:  SigStrDif by BlueWins
## t = -2.0263, df = 908.25, p-value = 0.04302
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -5.04417938 -0.08062694
## sample estimates:
## mean in group 0 mean in group 1 
##       -4.580945       -2.018542 
## 
## 
## [[3]]$T_SubAtt
## 
##  Welch Two Sample t-test
## 
## data:  AvgSubAttDif by BlueWins
## t = -2.4373, df = 878.38, p-value = 0.01499
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.22097638 -0.02383766
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.17518094     -0.05277392 
## 
## 
## [[3]]$T_TD
## 
##  Welch Two Sample t-test
## 
## data:  AvgTDDif by BlueWins
## t = -5.1609, df = 943.77, p-value = 2.997e-07
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.7671339 -0.3444433
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.50437470      0.05141388 
## 
## 
## 
## [[4]]
## [[4]]$WeightClass
## [1] "Welterweight"
## 
## [[4]]$T_Height
## 
##  Welch Two Sample t-test
## 
## data:  HeightDif by BlueWins
## t = -2.2435, df = 881.9, p-value = 0.02511
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.6974488 -0.1133277
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.5886496       0.3167386 
## 
## 
## [[4]]$T_Reach
## 
##  Welch Two Sample t-test
## 
## data:  ReachDif by BlueWins
## t = -1.8797, df = 919.63, p-value = 0.06047
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -2.02302123  0.04362854
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.5726460       0.4170504 
## 
## 
## [[4]]$T_Age
## 
##  Welch Two Sample t-test
## 
## data:  AgeDif by BlueWins
## t = -1.0541, df = 881.26, p-value = 0.2921
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.0013328  0.3015832
## sample estimates:
## mean in group 0 mean in group 1 
##       0.3503650       0.7002398 
## 
## 
## [[4]]$T_SigStr
## 
##  Welch Two Sample t-test
## 
## data:  SigStrDif by BlueWins
## t = -0.68317, df = 937.37, p-value = 0.4947
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -3.495547  1.690294
## sample estimates:
## mean in group 0 mean in group 1 
##       -2.606943       -1.704317 
## 
## 
## [[4]]$T_SubAtt
## 
##  Welch Two Sample t-test
## 
## data:  AvgSubAttDif by BlueWins
## t = -1.3288, df = 949.41, p-value = 0.1842
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.17357149  0.03341805
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.08796113     -0.01788441 
## 
## 
## [[4]]$T_TD
## 
##  Welch Two Sample t-test
## 
## data:  AvgTDDif by BlueWins
## t = -3.6951, df = 827.4, p-value = 0.0002342
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.6063173 -0.1856363
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.34752044      0.04845635 
## 
## 
## 
## [[5]]
## [[5]]$WeightClass
## [1] "Light Heavyweight"
## 
## [[5]]$T_Height
## 
##  Welch Two Sample t-test
## 
## data:  HeightDif by BlueWins
## t = -2.5328, df = 438.09, p-value = 0.01166
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -2.5348487 -0.3197531
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.8593985       0.5679024 
## 
## 
## [[5]]$T_Reach
## 
##  Welch Two Sample t-test
## 
## data:  ReachDif by BlueWins
## t = -4.5727, df = 454.55, p-value = 6.215e-06
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -5.298294 -2.113136
## sample estimates:
## mean in group 0 mean in group 1 
##       -2.546203        1.159512 
## 
## 
## [[5]]$T_Age
## 
##  Welch Two Sample t-test
## 
## data:  AgeDif by BlueWins
## t = -2.4606, df = 417.66, p-value = 0.01428
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -2.3936089 -0.2676418
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.8233083       0.5073171 
## 
## 
## [[5]]$T_SigStr
## 
##  Welch Two Sample t-test
## 
## data:  SigStrDif by BlueWins
## t = -0.86476, df = 438.12, p-value = 0.3876
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -4.941658  1.921791
## sample estimates:
## mean in group 0 mean in group 1 
##       -3.759615       -2.249682 
## 
## 
## [[5]]$T_SubAtt
## 
##  Welch Two Sample t-test
## 
## data:  AvgSubAttDif by BlueWins
## t = 0.38558, df = 468.26, p-value = 0.7
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.0948834  0.1412097
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.05031391     -0.07347707 
## 
## 
## [[5]]$T_TD
## 
##  Welch Two Sample t-test
## 
## data:  AvgTDDif by BlueWins
## t = 0.41796, df = 435.12, p-value = 0.6762
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.2389228  0.3679866
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.08689248     -0.15142439 
## 
## 
## 
## [[6]]
## [[6]]$WeightClass
## [1] "Bantamweight"
## 
## [[6]]$T_Height
## 
##  Welch Two Sample t-test
## 
## data:  HeightDif by BlueWins
## t = -0.3006, df = 487.71, p-value = 0.7638
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.141829  0.838808
## sample estimates:
## mean in group 0 mean in group 1 
##      0.01369272      0.16520325 
## 
## 
## [[6]]$T_Reach
## 
##  Welch Two Sample t-test
## 
## data:  ReachDif by BlueWins
## t = 0.45032, df = 530.27, p-value = 0.6527
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.065993  1.700078
## sample estimates:
## mean in group 0 mean in group 1 
##      0.30061995     -0.01642276 
## 
## 
## [[6]]$T_Age
## 
##  Welch Two Sample t-test
## 
## data:  AgeDif by BlueWins
## t = -1.4141, df = 516.59, p-value = 0.1579
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.4508779  0.2364059
## sample estimates:
## mean in group 0 mean in group 1 
##      0.03504043      0.64227642 
## 
## 
## [[6]]$T_SigStr
## 
##  Welch Two Sample t-test
## 
## data:  SigStrDif by BlueWins
## t = -1.9001, df = 511.39, p-value = 0.05798
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -6.4637943  0.1078237
## sample estimates:
## mean in group 0 mean in group 1 
##       -5.053142       -1.875157 
## 
## 
## [[6]]$T_SubAtt
## 
##  Welch Two Sample t-test
## 
## data:  AvgSubAttDif by BlueWins
## t = -1.8422, df = 543.61, p-value = 0.06599
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.275096980  0.008827292
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.10649623      0.02663862 
## 
## 
## [[6]]$T_TD
## 
##  Welch Two Sample t-test
## 
## data:  AvgTDDif by BlueWins
## t = -3.6722, df = 472.1, p-value = 0.000268
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.9858669 -0.2985614
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.2479385       0.3942756 
## 
## 
## 
## [[7]]
## [[7]]$WeightClass
## [1] "Flyweight"
## 
## [[7]]$T_Height
## 
##  Welch Two Sample t-test
## 
## data:  HeightDif by BlueWins
## t = 1.456, df = 279.53, p-value = 0.1465
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.3082369  2.0595388
## sample estimates:
## mean in group 0 mean in group 1 
##      0.89647059      0.02081967 
## 
## 
## [[7]]$T_Reach
## 
##  Welch Two Sample t-test
## 
## data:  ReachDif by BlueWins
## t = 0.26274, df = 251.18, p-value = 0.793
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.366117  1.786727
## sample estimates:
## mean in group 0 mean in group 1 
##      0.04604278     -0.16426230 
## 
## 
## [[7]]$T_Age
## 
##  Welch Two Sample t-test
## 
## data:  AgeDif by BlueWins
## t = -0.98045, df = 242.72, p-value = 0.3278
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.7510429  0.5871961
## sample estimates:
## mean in group 0 mean in group 1 
##       0.1229947       0.7049180 
## 
## 
## [[7]]$T_SigStr
## 
##  Welch Two Sample t-test
## 
## data:  SigStrDif by BlueWins
## t = -1.077, df = 246.21, p-value = 0.2825
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -7.564662  2.216416
## sample estimates:
## mean in group 0 mean in group 1 
##       -5.106238       -2.432115 
## 
## 
## [[7]]$T_SubAtt
## 
##  Welch Two Sample t-test
## 
## data:  AvgSubAttDif by BlueWins
## t = -1.033, df = 283.57, p-value = 0.3025
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.3694268  0.1151237
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.11450321      0.01264836 
## 
## 
## [[7]]$T_TD
## 
##  Welch Two Sample t-test
## 
## data:  AvgTDDif by BlueWins
## t = -1.1145, df = 272.07, p-value = 0.266
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.6574816  0.1821512
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.4414561      -0.2037910 
## 
## 
## 
## [[8]]
## [[8]]$WeightClass
## [1] "Heavyweight"
## 
## [[8]]$T_Height
## 
##  Welch Two Sample t-test
## 
## data:  HeightDif by BlueWins
## t = -1.793, df = 455.95, p-value = 0.07363
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -3.8767318  0.1775638
## sample estimates:
## mean in group 0 mean in group 1 
##      -1.2974101       0.5521739 
## 
## 
## [[8]]$T_Reach
## 
##  Welch Two Sample t-test
## 
## data:  ReachDif by BlueWins
## t = -3.4768, df = 459.33, p-value = 0.0005557
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -6.405747 -1.779356
## sample estimates:
## mean in group 0 mean in group 1 
##       -2.587878        1.504674 
## 
## 
## [[8]]$T_Age
## 
##  Welch Two Sample t-test
## 
## data:  AgeDif by BlueWins
## t = -0.40237, df = 354.69, p-value = 0.6877
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -1.3683485  0.9035377
## sample estimates:
## mean in group 0 mean in group 1 
##      0.07194245      0.30434783 
## 
## 
## [[8]]$T_SigStr
## 
##  Welch Two Sample t-test
## 
## data:  SigStrDif by BlueWins
## t = -2.0839, df = 412.9, p-value = 0.03779
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -7.2770557 -0.2123065
## sample estimates:
## mean in group 0 mean in group 1 
##      -3.3346827       0.4099984 
## 
## 
## [[8]]$T_SubAtt
## 
##  Welch Two Sample t-test
## 
## data:  AvgSubAttDif by BlueWins
## t = 0.14177, df = 431.71, p-value = 0.8873
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.1490500  0.1722243
## sample estimates:
## mean in group 0 mean in group 1 
##     -0.06642374     -0.07801087 
## 
## 
## [[8]]$T_TD
## 
##  Welch Two Sample t-test
## 
## data:  AvgTDDif by BlueWins
## t = -3.3076, df = 381.21, p-value = 0.001031
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -0.8984540 -0.2285157
## sample estimates:
## mean in group 0 mean in group 1 
##      -0.3784903       0.1849946

Hypothesis Testing Results & T-Test Interpretation

Flyweight (125lbs)

There is no reason to reject to null hypothesis since none of the six factors were significantly different between winners and losers.

This absent of significant factors can be caused by a few different things.

1.) Could be due to the fact that the flyweight division has the smallest sample size out of each weight class.

2.) Flyweights are a very fast and technical division, for success a fighter must be well rounded, this may make it difficult to identify a single key factor.

Bantamweight (135lbs)

There is reason to reject the null hypothesis for the take downs variable. Take downs landed were significantly different between winners and losers.

P-value for Take downs = 0.000268

Featherweight (145lbs)

There is reason to reject the null hypothesis for the Reach and Age variables. Reach and Age were significantly different between winners and losers.

P-value for Reach = 0.0216

P-value for Age = 0.0001102

Lightweight (155lbs)

There is reason to reject the null hypothesis for the Age, Strikes Landed, Submissions Attempted, and Take downs Landed variables. Age, Strikes Landed, Submissions Attempted, and Take downs Landed were significantly different between winners and losers.

P-value for Age = 0.01854

P-value for Strikes Landed = 0.04302

P-value for Submissions Attempted = 0.01499

P-value for Take downs Landed = 2.997e-7

Welterweight (170lbs)

There is reason to reject the null hypothesis for the Height and Take downs Landed variables. Height and Take downs Landed were significantly different between winners and losers.

P-value for Height = 0.02511

P-value for Take downs Landed = 0.0002342

Middleweight (185lbs)

There is reason to reject the null hypothesis for the Reach, Strikes Landed, Submissions Attempted, and Take downs Landed variables. Reach, Strikes Landed, Submissions Attempted, and Take downs Landed were significantly different between winners and losers.

P-value for Reach = 0.03614

P-value for Strikes Landed = 0.04105

P-value for Submissions Attempted = 0.0162

P-value for Take downs Landed = 0.005771

Light Heavyweight (205lbs)

There is reason to reject the null hypothesis for the Reach, Height, and Age variables. Reach, Height, and Age Landed were significantly different between winners and losers.

P-value for Height = 0.01166

P-value for Reach = 6.215e-6

P-value for Age = 0.01428

Heavyweight (265lbs)

There is reason to reject the null hypothesis for the Reach, Strikes Landed, and Take downs Landed variables. Reach, Strikes Landed, and Take downs Landed were significantly different between winners and losers.

P-value for Reach = 0.0005557

P-value for Strikes Landed = 0.03779

P-value for Take downs Landed = 0.001031

Interpretation

The flyweight division is the only weight class shown to accept the null hypothesis. There can be a few reasons for this, one, the flyweight division is fast-paced and super technical, this forces fighters to become very well rounded. This well rounded fighting style that is essential here in this division more than others may be the cause as to why it is difficult to find a variable of significance. Lighter fighters have to be more skilled and technical since they can’t take advantage of the natural advantages that come with going up in weight, Another possible explanation, from the counts earlier, we can see that the flyweights have the least amount of fights. Now we will use the GLM to address the second part of the hypothesis, for the factors found to be significant, we will run a GLM to determine its affect on winning.

Generalized Linear Model

Model Overview:

Dependent Variable: Winner of the fight (BlueWins)

Model: GLM with Binomial Distribution (Logistic Regression)

Observations: Filtered by weight class, see counts above.

# -------------------------------
# GLM: Logistic Regression by Weight Class
# -------------------------------
# Significant factors for each weight class
weight_classes_significant_factors <- list(
  "Bantamweight" = c("AvgTDDif"),
  "Featherweight" = c("ReachDif", "AgeDif"),
  "Lightweight" = c("AgeDif", "SigStrDif", "AvgSubAttDif", "AvgTDDif"),
  "Welterweight" = c("HeightDif", "AvgTDDif"),
  "Middleweight" = c("ReachDif", "SigStrDif", "AvgSubAttDif", "AvgTDDif"),
  "Light Heavyweight" = c("ReachDif", "HeightDif", "AgeDif"),
  "Heavyweight" = c("ReachDif", "SigStrDif", "AvgTDDif")
)

glm_results <- list()

for (weight_class in names(weight_classes_significant_factors)) {
  significant_factors <- weight_classes_significant_factors[[weight_class]]
  
  data_subset <- filtered_dataset |>
    filter(WeightClass == weight_class) |>
    select(BlueWins, all_of(significant_factors))
  
  glm_model <- glm(BlueWins ~ ., data = data_subset, family = binomial)
  glm_results[[weight_class]] <- summary(glm_model)
}

for (weight_class in names(glm_results)) {
  cat("\n\n---", weight_class, "---\n")
  print(glm_results[[weight_class]])
}
## 
## 
## --- Bantamweight ---
## 
## Call:
## glm(formula = BlueWins ~ ., family = binomial, data = data_subset)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.42085    0.08330  -5.052 4.37e-07 ***
## AvgTDDif     0.15451    0.04239   3.645 0.000267 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 829.84  on 616  degrees of freedom
## Residual deviance: 815.53  on 615  degrees of freedom
## AIC: 819.53
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## 
## --- Featherweight ---
## 
## Call:
## glm(formula = BlueWins ~ ., family = binomial, data = data_subset)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.460625   0.079129  -5.821 5.84e-09 ***
## ReachDif     0.017168   0.009834   1.746 0.080858 .  
## AgeDif       0.062059   0.017487   3.549 0.000387 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 928.53  on 693  degrees of freedom
## Residual deviance: 910.29  on 691  degrees of freedom
## AIC: 916.29
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## 
## --- Lightweight ---
## 
## Call:
## glm(formula = BlueWins ~ ., family = binomial, data = data_subset)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.282757   0.066746  -4.236 2.27e-05 ***
## AgeDif        0.039051   0.013516   2.889  0.00386 ** 
## SigStrDif     0.005105   0.003348   1.525  0.12733    
## AvgSubAttDif  0.146797   0.085361   1.720  0.08548 .  
## AvgTDDif      0.182203   0.040413   4.508 6.53e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1371.1  on 1010  degrees of freedom
## Residual deviance: 1332.4  on 1006  degrees of freedom
## AIC: 1342.4
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## 
## --- Welterweight ---
## 
## Call:
## glm(formula = BlueWins ~ ., family = binomial, data = data_subset)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.24443    0.06595  -3.706 0.000210 ***
## HeightDif    0.02733    0.01071   2.553 0.010681 *  
## AvgTDDif     0.16174    0.04178   3.871 0.000108 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1319.9  on 964  degrees of freedom
## Residual deviance: 1299.2  on 962  degrees of freedom
## AIC: 1305.2
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## 
## --- Middleweight ---
## 
## Call:
## glm(formula = BlueWins ~ ., family = binomial, data = data_subset)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -0.176503   0.076069  -2.320   0.0203 *
## ReachDif      0.017989   0.009151   1.966   0.0493 *
## SigStrDif     0.007619   0.004212   1.809   0.0705 .
## AvgSubAttDif  0.157309   0.096076   1.637   0.1016  
## AvgTDDif      0.088992   0.050015   1.779   0.0752 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1002.03  on 728  degrees of freedom
## Residual deviance:  984.73  on 724  degrees of freedom
## AIC: 994.73
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## 
## --- Light Heavyweight ---
## 
## Call:
## glm(formula = BlueWins ~ ., family = binomial, data = data_subset)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.22301    0.09553  -2.334 0.019571 *  
## ReachDif     0.05185    0.01426   3.635 0.000278 ***
## HeightDif   -0.01264    0.02022  -0.625 0.531741    
## AgeDif       0.03480    0.01692   2.057 0.039652 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 645.02  on 470  degrees of freedom
## Residual deviance: 620.35  on 467  degrees of freedom
## AIC: 628.35
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## 
## --- Heavyweight ---
## 
## Call:
## glm(formula = BlueWins ~ ., family = binomial, data = data_subset)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.369554   0.098124  -3.766 0.000166 ***
## ReachDif     0.032361   0.009660   3.350 0.000808 ***
## SigStrDif    0.008895   0.005208   1.708 0.087647 .  
## AvgTDDif     0.173706   0.059298   2.929 0.003396 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 621.21  on 461  degrees of freedom
## Residual deviance: 594.08  on 458  degrees of freedom
## AIC: 602.08
## 
## Number of Fisher Scoring iterations: 4

Key Findings

The GLM identifies specific factors that significantly impact the likelihood of winning, this will be based on the p-value of the model. Make sure to mention p values here

Bantamweight

-P-value for Take down Differential = 0.000267

-For every unit increase in the Take down Differential, the log odds of winning increase by ~.1545.

-For every unit increase in the Take down Differential, the odds of winning increase by 16.7%.

Featherweight

-P-value for Age Differential = 0.000387

-For every unit increase in Age Differential, the log odds of the blue corner winning increase by ~0.0621. 

-The odds of winning increase by 6.4% per unit increase in Age Differential.

Lightweight

-P-value for Age Differential = 0.00386

-P-value for Take down Differential = 6.53e-6

-For every unit increase in Age Differential and Take down Differential, the log odds of winning increase by ~0.0391 and ~0.1822, respectively. 

-For every unit increase in Age Differential and Take down Differential, the odds of winning increase by ~4.0% and ~20.0%, respectively. 

Welterweight

-P-value for Height Differential = 0.010681

-P-value for Take down Differential = 0.000108

-For every unit increase in Height Differential and Take down Differential, the log odds of winning increase by ~0.0273 and ~0.1617, respectively. 

-For every unit increase in Height Differential and Take down Differential, the odds of winning increase by ~2.8% and ~17.6%, respectively. 

Middleweight

-P-value for Reach Differential = 0.0493

-For every unit increase of Reach Differential, the log odds of victory increased by ~0.0180. 

-For every unit increase of Reach Differential, the odds of victory increased by ~1.8%.

Light Heavyweight

-P-value for Reach Differential = 0.000278

-P-value for Age Differential = 0.039652

For every unit increase of Reach Differential and Age Differential, the log-odds of winning increase by ~0.0519 and ~0.0348, respectively. 

For every unit increase of Reach Differential and Age Differential. The odds of winning increase by ~5.3% and ~3.5%. 

Heavyweight

-P-value for Reach Differential = 0.000808

-P-value for Take downs Differential = 0.003396

For every unit increase in Reach Differential and Take down Differential, the log odds of victory increase by ~0.0324 and ~0.1737, respectively. 

For every unit increase in Reach and Take down Differential, the odds of winning increase by ~3.3% and ~18.9%, respectively. 

Significance and Application

Knowing exactly how much of an advantage one more take down over your opponent or one more strike gives a fighter in a UFC fight is crucial and important to many parts of the sport. Coaches can give better real time advice that could shift the way a fighting is going or use this info to prepare their fighters better. Those involved in Sports Gambling can make more educated picks and those in charge of the betting odds can use this info (I know they already do) to shift the odds in favor of the house. Analysts can use this to give more detailed and insightful commentary so that the audience truly understands the weight of each factor in a fight. Although, take downs have proved to be very important in our analysis, the drastic increase in a fighters chance of winning a fight caused by an advantage in take down differential, should be taken with a grain of salt. Accompanying the drastic increase in the odds of winning, the GLM also returned very low p-values for take downs, indicating that we have less confidence in the related findings.

Practical Scenarios and Recommendations

Introduction

This analysis provides key insights into what is takes to win a fight in the UFC, the biggest MMA promotion in the world, focusing on key fight metrics and physical attributes of the fighters. These insights are aimed to enhance coaching, commentary, and overall understanding of the sport.

Fight Tips

Bantamweight:

Control of when the fight goes to the ground is crucial, having one more take down than your opponent results in a ~16.7% increase on your chances of winning, landing two more than your opponent results in double that. Fighters in this weight must be adept at all forms of grappling to ensure that they are the ones landing take downs and staying on top first. For strikers in this weight, they must have top tier take down defense since allowing a take down puts you at such a statistical disadvantage.

Featherweight:

This weight class seems to favor those who are experienced since age is the only significant factor. Our analysis found that the older fighter actually has a slight advantage, which is contrary to popular belief. This weight class should focus on building experience and familiarizing themselves with fighting in the octagon and against a wide variety of opponents in order to propel themselves to the top.

Lightweight:

This weight class seems to be a combo of the prior two. This weight class seems to favor the experienced and the one who lands the take downs. Landing one more take down than your opponent can result in a 20% increase in a fighters chance of winning in this weight. This makes sense seeing that the last three champions in this division have been 2 wrestlers and a BJJ specialist. Fighters in this weight must be experienced in the octagon and adept at taking their opponent to the ground to achieve the most success.

Welterweight:

Now we begin to see some physical attributes play a significant role, this is expected as you move up in weight. Heavier and larger individuals are more able to leverage their physical attributes to their advantage. It seems that height and take downs are the most influential factors in this division. Taller people are harder to take down, which may be the reasons that we are seeing these two together. However, taller people are also easier to keep on the ground once they’ve been out there, a double edged sword. This weight class is historically grapple heavy so to compete here one’s wrestling should be top notch as this weightless is home to many world class wrestlers and grapples.

Middleweight:

This weight class, unlike the prior ones, is not historically dominated by grapplers. Instead the champions of this class have been strikers adept and leveraging their power and natural gifts to control the fight. Reach was found to be the most influential factor on this weight suggesting the longer fighter is usually at an advantage. The fighters in this weight carry a lot of power but also aren’t so heavy that they are slow, a happy medium. This has resulted in many crafty strikers making their way up the ranks with their striking and take down defense. Since reach isn’t a trait one can really improve, fighters at this class should focus on improving their positioning and depth perception. They should have means on controlling distance and the pace of the fight in order to find success.

Light Heavyweight:

Our analysis suggests that age and reach are the most influential metrics on victory in this weight. Reach in these heavier weight class is crucial as it allows you to control distance and when to initiate. The heavier weight classes has always been more kind to older fighters that the lighter weights and that is mostly because speed isn’t the main factor in these weights. Fighters in this weight class should fight often for more experience and train their positioning and distance management.

Heavyweight:

Lastly, the heavyweights seems to be most influenced by Reach and Take downs. If you are to land one more take down than your opponent we observe a 18.9% increase in ones chances of victory. Reach is important for the same reason it is in Light Heavyweight. However, it seems that more heavyweights should invest in wrestling. This weight class is home to many strikers that are just looking for that one shot knockout, as expected of a heavyweight. Our analysis suggests that a new train of thought may lead to an advantage in this division. Fighters of this weight should be finding ways to secure safe take downs and control the fight on the ground.

Conclusion

-Take downs Landed seems to be a recurring significant factor across many of the weight classes. This indicates that controlling the fight via a successful take down is crucial to victory. Advantages in the take down differential often resulted in double digit impairments to ones odds of winning.

-Reach seems to be particularly significant at higher weights. This suggests that controlling distance is more crucial as we go up in weight.

-Age is seen to be significant in multiple weight classes, suggesting that experience may be more valuable than youth in this sport.