library(dplyr, verbose = F)
library(ggplot2, verbose = F)
transferData <- read.csv("transfer_data.csv", stringsAsFactors = F)
cat("There have been", length(transferData$PLAYER),"transfers that took place from 2007 - 2017.")
## There have been 6237 transfers that took place from 2007 - 2017.
cat("There have been", length(unique(transferData$PLAYER)),"players transferred from 2007 - 2017.")
## There have been 4167 players transferred from 2007 - 2017.
Usually there are 2 transfer windows. 1. Summer Window. 2. Mid-Season Window.
transferData %>%
filter(is.na(WINDOW)) %>%
nrow()
## [1] 1
So, there is a row with missing values. Let’s remove the values.
transferData <- transferData %>%
filter(!is.na(WINDOW))
df <- transferData %>%
group_by(WINDOW) %>%
summarise(Percent = round((n()*100)/nrow(transferData)) )
df %>%
ggplot(aes(x=WINDOW, y=Percent)) + geom_bar(stat='identity', fill='tomato') +
ggtitle("Total number of transfers in a window(2007-2017)") +
geom_label(label=df$Percent)
78% of the transfers happenduring the pre-season window.
transferData$SEASON[transferData$WINDOW == "Mid-Season" & transferData$SEASON == "15"] <- "14/15"
transferData$SEASON[transferData$WINDOW == "Mid-Season" & transferData$SEASON == "16"] <- "15/16"
transferData$SEASON[transferData$WINDOW == "Pre-Season" & transferData$SEASON == "15"] <- "15/16"
transferData$SEASON[transferData$WINDOW == "Pre-Season" & transferData$SEASON == "16"] <- "16/17"
transferData %>%
group_by(WINDOW, SEASON) %>%
summarise(Count = n()) %>%
ggplot( aes(x=SEASON, y=Count, group=WINDOW)) +
geom_line(aes(color=WINDOW))+
geom_point(aes(color=WINDOW)) +
theme_bw()
The number of transfers every season has increased. The team have tried to establish their presence by making as many transfers as possible every season.