#importing libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
data <-read.csv('C:/Downloads/final_dataset.csv')
colnames(data)
## [1] "X" "Date" "HomeTeam" "AwayTeam"
## [5] "FTHG" "FTAG" "FTR" "HTGS"
## [9] "ATGS" "HTGC" "ATGC" "HTP"
## [13] "ATP" "HM1" "HM2" "HM3"
## [17] "HM4" "HM5" "AM1" "AM2"
## [21] "AM3" "AM4" "AM5" "MW"
## [25] "HTFormPtsStr" "ATFormPtsStr" "HTFormPts" "ATFormPts"
## [29] "HTWinStreak3" "HTWinStreak5" "HTLossStreak3" "HTLossStreak5"
## [33] "ATWinStreak3" "ATWinStreak5" "ATLossStreak3" "ATLossStreak5"
## [37] "HTGD" "ATGD" "DiffPts" "DiffFormPts"
1- FTHG=Full time home goals.
This column represents the goals scored by the home team at full time.After reading the name of the column it may not immediately clear what FTHG stands for without clear documentation provided.
FTHG is commonly used abbreviation in sports data.
It can be confusing while analyzing match data since many not be aware that “FTHG” stands for home team goals in the absence of documentation.
2-FTR=Full time Result.
This column represents the full-time result of a match. It contains values like “H=Home Win” and “NH=Not home” which may not be self explanatory without context.
Using “FTR” to represent the full time result is common in football data.
It can be confusing and hard to understand immediately that “FTR” stands for “Full Time Result,” leading to confusion about the meaning of the values “H” “NH” .
3.ATGS=Away Team Goals Scored.
This column represents the goals scored by the away team.It is confusing to understand at first what exactly “ATGS” represents without any context.Hence documentation is necessary to provide right context and to understand the data better.
The “FTR” column provides information about the full time result of a match.The documentation might not fully explain the meaning of the column.It includes “H” for home team winning “NH” for away team winning. However without context it might not be immediately clear what the data represent.
The documentation might not entirely state that “H” stands for the home team winning, “NH” for the away team winning.If not described correctly in the documentation while analyzing the data different meaning can be interpreted.
If the “FTR” column is not clearly defined, it can affect the accuracy of full time result analysis which is a fundamental aspect of understanding match outcomes.
It can be improved by stating the exact meaning of the codes in the “FTR” column, ensuring that “H” represents a home team win, “NH” represents an away team win.
data <-data
ggplot(data, aes(x = FTR)) +
geom_bar(fill = "orange") +
labs(
title = "Full-Time Results",
x = "Result",
y = "Count"
) +
annotate(
"text",
x = "H",
y = max(table(data$FTR), na.rm = TRUE) + 1,
label = "Unclear: H = Home Win",
color = "black"
) +
annotate(
"text",
x = "NH",
y = max(table(data$FTR), na.rm = TRUE) + 2,
label = "Unclear: NH = Away Win",
color = "black"
)