This project analyzes a dataset of MMA fight decisions to understand how judges score fights. In MMA, three judges evaluate each fight and assign scores to each fighter, but their decisions do not always agree. The dataset includes variables such as the score margins given by each judge, the number of rounds in the fight and the type of decision(unanimous, split or majority). It also uncludes information about which fighter won and how much the judges scores differed. The goal of this project is to explore how factors like score differences, fight length and decision type influence disagreement between judges.
Unanimous decision: All judges agree
Split decision: judges disagree
Majortity decision: two agree, one disagree
Source: ESPN
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggfortify)
Warning: package 'ggfortify' was built under R version 4.5.2
mma <-read_csv("mma_decisions.csv")
Rows: 5000 Columns: 32
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (18): event, arena, city, fighter1, fighter2, result_type, judge1, judg...
dbl (13): judge1_score1, judge1_score2, judge2_score1, judge2_score2, judge...
date (1): date
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Create new variables for analysismma_clean1 <- mma_clean |>mutate(avg_dev = (judge1_dev + judge2_dev + judge3_dev)/3,total_margin =abs(judge1_margin) +# here margin(difference between judges score) can be positive or negative that's why I needed to use the abs (I google it)abs(judge2_margin) +abs(judge3_margin) )
#creating a new variable bt grouping everything into Agree or Disagree mma_clean2 <- mma_clean1 |>mutate(agreement_simple =ifelse(agreement =="Agree", "Agree", "Disagree"))
Warning: `fortify(<lm>)` was deprecated in ggplot2 4.0.0.
ℹ Please use `broom::augment(<lm>)` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation idioms with `aes()`.
ℹ See also `vignette("ggplot2-in-packages")` for more information.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
ℹ The deprecated feature was likely used in the ggfortify package.
Please report the issue at <https://github.com/sinhrks/ggfortify/issues>.
The model has the equation: avg_dev = 0.0359(total_margin) + 0.2055(rounds5) + 1.0421(rounds12) + 0.5993(result_typeSplit) − 0.6682(result_typeUnanimous) + 0.9859
The slope may be interpreted in the following way: For each additional unit increase in total_margin, there is a predicted increase of about 0.036 in average judge deviation, holding all the other variables constant.
model2 <-lm(avg_dev ~ total_margin + result_type, data = mma_final)summary(model2)
Call:
lm(formula = avg_dev ~ total_margin + result_type, data = mma_final)
Residuals:
Min 1Q Median 3Q Max
-1.0667 -0.4119 -0.3699 0.7468 2.5705
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.020280 0.053444 19.09 <2e-16 ***
total_margin 0.043653 0.003067 14.23 <2e-16 ***
result_typeSplit 0.552011 0.056272 9.81 <2e-16 ***
result_typeUnanimous -0.739347 0.055250 -13.38 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6305 on 4996 degrees of freedom
Multiple R-squared: 0.3777, Adjusted R-squared: 0.3773
F-statistic: 1011 on 3 and 4996 DF, p-value: < 2.2e-16
autoplot(model2, 1:4, nrow =2, ncol =2)
The variable rounds had the highest p-value, so it was removed in the second model. The R-squared decreased from 0.3891 to 0.3777, meaning the model explains slightly less variation without rounds. This shows that rounds still contributes to explaining judge disagreement.
ggplot(mma_final, aes(x = result_type, y = avg_dev, fill = result_type)) +geom_boxplot() +facet_wrap(~ rounds) +labs(title ="Judge Disagreement by Decision Type and Fight Length",x ="Result Type",y ="Average Judge Deviation",fill ="Result Type",caption ="Source: ESPN" ) +scale_fill_manual(values =c("darkred", "darkblue", "darkgreen")) +theme_minimal(base_size =12)
unique(mma_final$rounds)
[1] 3 5 12
Levels: 3 5 12
#Working on 3 and 5 roundsmma_plot1 <- mma_final |>filter(rounds %in%c("3", "5"))
ggplot(mma_plot1, aes(x = result_type, y = avg_dev, fill = result_type)) +geom_boxplot() +facet_wrap(~ rounds) +labs(title ="Judge Disagreement by Decision Type and Fight Length",x ="Result Type",y ="Average Judge Deviation",fill ="Result Type",caption ="ESPN" ) +scale_fill_manual(values =c("darkred", "darkblue", "darkgreen")) +theme_minimal(base_size =12)
Reflection
The dataset was cleaned by removing rows with missing values in key variables using filter(). New variables were created using mutate(), including avg_dev to measure judge disagreement and total_margin tp represent fight closennes. The agreement variable was simplified into “Agree” and “Disagree” and only fights with 3 or 5 rounds were kept for consistency.
The boxplot shows judge disagreement across decision types and fight lengths. Split decisions have the highest disagreement, majority decisions are moderate, and unanimous decisions have the lowest. Fights with more rounds also tend to show slightly higher disagreement, suggesting that longer fights are harder to judge.
A scatterplot was considered but showed a weak relationship, so a boxplot was used instead. A limitation of this analysis is that it does not include detailed fighter performance data, which could help better explain judge disagreement.