R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

#Summary

##The dataset I am working with is one related to the UFC, the premier MMA promotion where the best of the best meet and compete for the championship belt. This dataset is a comprehensive lists of all fights that have taken place since 2010 and contains statistics and insights into each of those fights. Statistics such as winners, differentials in physical builds, the rate at which a fighter lands strikes, takedowns, and submissions are all documented in the data set.

#Dataset Link: https://www.kaggle.com/datasets/mdabbert/ultimate-ufc-dataset ##Doumentation and desriptions of each of the columns can be found in the link above. Scroll down to the preview of the dataset and you will see that each column has a description.

#Main Goal:

##The main goal of this project is too find what factor influences victory the most.Is it the rate and efficiency in which a fighter lands his strikes that has the most influence over winning a fight? Is it a fighters physical composition the most important factor? Or is it how active they are in the ground game with takedowns and submission attempts that matters most?

#Visualizations and Initial Findings

##The graphs below allow me to understand the averages and what constitutes exceptional performance in each statistic. This is important since I want to create a relationship between victory and having a cetain amount of one of the statistics below. At what rate does a fighter have to land strikes to influence victory? How many submissions must a fighter attempt for victory to be more likely? I want to answer these questions for all that statistics below so understanding their averages and even their outliers will be important to my project.

Hypothesis

##Hypothesis #1: A fighter must land an average of 9 or more significant strikes per minute AND land at a percentage above 65% of their strikes thrown in order to have a substatial influence on whether or not they win the fight.

##Hypothesis #2: A fighter must complete 3 takedowns over 15 minutes AND land 60% of their takedown attempts in order to sway the fight in their favor.

##Is there any significance between the strike differential and victory? Takedown differential? Submission differentials?

Project Going Forward

# Next I will try to calculate correlation between certain statistics and victory. I will do this by converting methods of victory into numerical values and use a Pearson correlation to see if there is a positive correlation between the two. I don’t expect to find a strong correlation on any one statistic due to the random nature of fighting, so I will do the same thing with fight differentials to get a more detailed look and determine if there is a certain differential that influences a fight more than any of the others. Using this information I will adjust the numbers within the hypothesis, then I will create new columns that show whether or not a fighter met the criteria within my hypothesis. I will then compare the win rate of fighters that meet my criteria to those who do not. I will do this with some sort of logistic regression, I believe that we will learn how to next lecture, I can also use chi-squared tests or cross-tabs in order to determine this I believe.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(readxl)

dataset <- read_excel("~/Downloads/UFC_Dataset.xls")

#Filtering Dataset for fights in the Men's Division after 2020
dataset<-dataset |>
  filter(Gender == "MALE" & as.Date(Date, format = "%Y,%m,%d") > as.Date("2020-12-31"))

#Combining Important Statistics so that we can visualize them comprehensively 
combined_stats <- dataset |>
  select(
    RedAvgSigStrLanded, BlueAvgSigStrLanded,
    RedAvgSubAtt, BlueAvgSubAtt,
    RedAvgSigStrPct, BlueAvgSigStrPct,
    RedAvgTDLanded, BlueAvgTDLanded,
    RedAvgTDPct, BlueAvgTDPct
  ) |>
  mutate(
    AvgsigStrLanded = (RedAvgSigStrLanded + BlueAvgSigStrLanded) / 2,
    AvgSubAtt = (RedAvgSubAtt + BlueAvgSubAtt) / 2,
    AvgSigStrPct = (RedAvgSigStrPct + BlueAvgSigStrPct) / 2,
    AvgTdLanded = (RedAvgTDLanded + BlueAvgTDLanded) / 2,
    AvgTdPct = (RedAvgTDPct + BlueAvgTDPct) / 2
  ) |>
  select(AvgsigStrLanded, AvgSubAtt, AvgSigStrPct, AvgTdLanded, AvgTdPct)
#Boxplot of Average Significant Strikes landed
ggplot(combined_stats, aes(x = "", y = AvgsigStrLanded)) +
  geom_boxplot(fill = "skyblue", color = "black") +
  labs(
    title = "Box Plot of Average Significant Strikes Landed per Minute",
    x = NULL,
    y = "Average Significant Strikes Landed"
  ) +
  theme_minimal() +
  coord_flip()

#Boxplot for average significant strike percentage
ggplot(combined_stats, aes(x = NULL, y = AvgSigStrPct)) +
  geom_boxplot(fill = "skyblue", color = "black") +
  labs(
    title = "Box Plot of Average Significant Strike Percentage",
    y = "Average Significant Strike Percentage"
  ) +
  theme_minimal() +
  coord_flip()

#Boxplot for average submission attempts per 15min
ggplot(combined_stats, aes(x = NULL, y = AvgSubAtt)) +
  geom_boxplot(fill = "lightgreen", color = "black") +
  labs(
    title = "Box Plot of Average Submission Attempts per 15 min",
    y = "Average Submission Attempts"
  ) +
  theme_minimal() +
  coord_flip()

#Boxplot for average Takedowns landed
ggplot(combined_stats, aes(x = NULL, y = AvgTdLanded)) +
  geom_boxplot(fill = "plum", color = "black") +
  labs(
    title = "Box Plot of Average Takedowns Landed per 15min",
    y = "Average Takedowns Landed"
  ) +
  theme_minimal() +
  coord_flip()

ggplot(combined_stats, aes(x = NULL, y = AvgTdPct)) +
  geom_boxplot(fill = "plum", color = "black") +
  labs(
    title = "Box Plot of Average Takedown Percentage",
    y = "Average Takedown Percentage"
  ) +
  theme_minimal() +
  coord_flip()

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.