#Main Goal:
##The main goal of this project is too find what factor influences
victory the most.Is it the rate and efficiency in which a fighter lands
his strikes that has the most influence over winning a fight? Is it a
fighters physical composition the most important factor? Or is it how
active they are in the ground game with takedowns and submission
attempts that matters most?
Project Going Forward
# Next I will try to calculate correlation between certain statistics
and victory. I will do this by converting methods of victory into
numerical values and use a Pearson correlation to see if there is a
positive correlation between the two. I don’t expect to find a strong
correlation on any one statistic due to the random nature of fighting,
so I will do the same thing with fight differentials to get a more
detailed look and determine if there is a certain differential that
influences a fight more than any of the others. Using this information I
will adjust the numbers within the hypothesis, then I will create new
columns that show whether or not a fighter met the criteria within my
hypothesis. I will then compare the win rate of fighters that meet my
criteria to those who do not. I will do this with some sort of logistic
regression, I believe that we will learn how to next lecture, I can also
use chi-squared tests or cross-tabs in order to determine this I
believe.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readxl)
dataset <- read_excel("~/Downloads/UFC_Dataset.xls")
#Filtering Dataset for fights in the Men's Division after 2020
dataset<-dataset |>
filter(Gender == "MALE" & as.Date(Date, format = "%Y,%m,%d") > as.Date("2020-12-31"))
#Combining Important Statistics so that we can visualize them comprehensively
combined_stats <- dataset |>
select(
RedAvgSigStrLanded, BlueAvgSigStrLanded,
RedAvgSubAtt, BlueAvgSubAtt,
RedAvgSigStrPct, BlueAvgSigStrPct,
RedAvgTDLanded, BlueAvgTDLanded,
RedAvgTDPct, BlueAvgTDPct
) |>
mutate(
AvgsigStrLanded = (RedAvgSigStrLanded + BlueAvgSigStrLanded) / 2,
AvgSubAtt = (RedAvgSubAtt + BlueAvgSubAtt) / 2,
AvgSigStrPct = (RedAvgSigStrPct + BlueAvgSigStrPct) / 2,
AvgTdLanded = (RedAvgTDLanded + BlueAvgTDLanded) / 2,
AvgTdPct = (RedAvgTDPct + BlueAvgTDPct) / 2
) |>
select(AvgsigStrLanded, AvgSubAtt, AvgSigStrPct, AvgTdLanded, AvgTdPct)
#Boxplot of Average Significant Strikes landed
ggplot(combined_stats, aes(x = "", y = AvgsigStrLanded)) +
geom_boxplot(fill = "skyblue", color = "black") +
labs(
title = "Box Plot of Average Significant Strikes Landed per Minute",
x = NULL,
y = "Average Significant Strikes Landed"
) +
theme_minimal() +
coord_flip()

#Boxplot for average significant strike percentage
ggplot(combined_stats, aes(x = NULL, y = AvgSigStrPct)) +
geom_boxplot(fill = "skyblue", color = "black") +
labs(
title = "Box Plot of Average Significant Strike Percentage",
y = "Average Significant Strike Percentage"
) +
theme_minimal() +
coord_flip()

#Boxplot for average submission attempts per 15min
ggplot(combined_stats, aes(x = NULL, y = AvgSubAtt)) +
geom_boxplot(fill = "lightgreen", color = "black") +
labs(
title = "Box Plot of Average Submission Attempts per 15 min",
y = "Average Submission Attempts"
) +
theme_minimal() +
coord_flip()

#Boxplot for average Takedowns landed
ggplot(combined_stats, aes(x = NULL, y = AvgTdLanded)) +
geom_boxplot(fill = "plum", color = "black") +
labs(
title = "Box Plot of Average Takedowns Landed per 15min",
y = "Average Takedowns Landed"
) +
theme_minimal() +
coord_flip()

ggplot(combined_stats, aes(x = NULL, y = AvgTdPct)) +
geom_boxplot(fill = "plum", color = "black") +
labs(
title = "Box Plot of Average Takedown Percentage",
y = "Average Takedown Percentage"
) +
theme_minimal() +
coord_flip()

Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.