Data Dive #4

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(readxl)

dataset <- read_excel("~/Downloads/UFC_Dataset.xls")

#Filtering Dataset for fights in the Men's Division after 2020
dataset<-dataset |>
  filter(Gender == "MALE" & as.Date(Date, format = "%Y,%m,%d") > as.Date("2020-12-31"))

#Take 5 random samples
sample_1 <- dataset |> sample_frac(size = 0.5, replace = TRUE)
sample_2 <- dataset |> sample_frac(size = 0.5, replace = TRUE)
sample_3 <- dataset |> sample_frac(size = 0.5, replace = TRUE)
sample_4 <- dataset |> sample_frac(size = 0.5, replace = TRUE)
sample_5 <- dataset |> sample_frac(size = 0.5, replace = TRUE)

# Store samples in separate data frames
df_1 <- sample_1
df_2 <- sample_2
df_3 <- sample_3
df_4 <- sample_4
df_5 <- sample_5

# Find mean of each of the Samples
summary_1 <- df_1 |> group_by(Gender) |> summarise(BlueAvgSigStrLanded = mean(BlueAvgSigStrLanded, na.rm = TRUE))
summary_2 <- df_2 |> group_by(Gender) |> summarise(BlueAvgSigStrLanded = mean(BlueAvgSigStrLanded, na.rm = TRUE))
summary_3 <- df_3 |> group_by(Gender) |> summarise(BlueAvgSigStrLanded = mean(BlueAvgSigStrLanded, na.rm = TRUE))
summary_4 <- df_4 |> group_by(Gender) |> summarise(BlueAvgSigStrLanded = mean(BlueAvgSigStrLanded, na.rm = TRUE))
summary_5 <- df_5 |> group_by(Gender) |> summarise(BlueAvgSigStrLanded = mean(BlueAvgSigStrLanded, na.rm = TRUE))

# Print summaries means
print(summary_1)

## # A tibble: 1 × 2
##   Gender BlueAvgSigStrLanded
##   <chr>                <dbl>
## 1 MALE                  4.01

print(summary_2)

## # A tibble: 1 × 2
##   Gender BlueAvgSigStrLanded
##   <chr>                <dbl>
## 1 MALE                  4.03

print(summary_3)

## # A tibble: 1 × 2
##   Gender BlueAvgSigStrLanded
##   <chr>                <dbl>
## 1 MALE                  4.12

print(summary_4)

## # A tibble: 1 × 2
##   Gender BlueAvgSigStrLanded
##   <chr>                <dbl>
## 1 MALE                  3.96

print(summary_5)

## # A tibble: 1 × 2
##   Gender BlueAvgSigStrLanded
##   <chr>                <dbl>
## 1 MALE                  4.06

##Scrutinize Samples: ###It seems that the mean amongst the male fights for the volume of strikes landed is rather consistent amonsgst all samples and corresponds to visualizatiosn I’ve made in other assignments. However, they aren’t exactly the same with the biggest difference seen between the third and fourth. I’d expect this to be the case for other important statistics like submission attmepts and takedown attemtpts as well. However, if we were to do something similar for a column like win streak or loss streaks I believe we would see a varied result in each of the samples instead since there are many more outliers to consider in a cateogry liek this. This procedure doesn’t seem all to useful for my project and goals considering I want to draw a relationship between performance in a certain metric and victory so having all the data instead of samples of it seems to be more beneficial to my goals. However, many interesting statistics can still be drawn from sampling, maybe we can get different samples during different “eras” of the UFC and get samples in 3 year increments to see how the game has evolved.

#Monte Carlo Sim for Blue fighters strikes landed
monte_carlo_results <- replicate(1000, {
  sample <- dataset |> sample_frac(size = 0.5, replace = TRUE)
  mean(sample$BlueAvgSigStrLanded, na.rm =TRUE)
})

monte_carlo_results <- as.numeric(monte_carlo_results)

hist(monte_carlo_results, main = "Monte Carlo Sim of Significant Strikes Landed (BLUE)", xlab = "Avg Strikes Landed")

##Monte Carlo Sim for Red fighters strikes landed
monte_carlo_results <- replicate(1000, {
  sample <- dataset |> sample_frac(size = 0.5, replace = TRUE)
  mean(sample$RedAvgSigStrLanded, na.rm =TRUE)
})

monte_carlo_results <- as.numeric(monte_carlo_results)

hist(monte_carlo_results, main = "Monte Carlo Sim of Significant Strikes Landed (RED)", xlab = "Avg Strikes Landed")

##I might want to implement this montecarlo simulation into the statisticss that I find are significant to winning, which I belive I will learn next week how to do. However, these monte carlo simulations have done a better job of visualizing the means of my data. So i might need to run this for other statistics as well.

Data Dive #4

2024-10-09

R Markdown