English Premier League Analysis1.Does the home stadium give an advantage and if yes how?
2.What is the best way of winning matches, is it defensive or attacking.
3.Who is the best coach in premier league history.
Data<-read.csv("C:\\Users\\CHOLA\\Desktop\\my_data.csv")
head(Data)
## Competition_Name Gender Country Season_End_Year Round Wk Day Date Time
## 1 Premier League M ENG 1993 NA 1 Sat 1992-08-15 <NA>
## 2 Premier League M ENG 1993 NA 1 Sat 1992-08-15 <NA>
## 3 Premier League M ENG 1993 NA 1 Sat 1992-08-15 <NA>
## 4 Premier League M ENG 1993 NA 1 Sat 1992-08-15 <NA>
## 5 Premier League M ENG 1993 NA 1 Sat 1992-08-15 <NA>
## 6 Premier League M ENG 1993 NA 1 Sat 1992-08-15 <NA>
## Home HomeGoals Away AwayGoals Attendance Venue
## 1 Southampton 0 Tottenham 0 19654 The Dell
## 2 Coventry City 2 Middlesbrough 1 12681 Highfield Road
## 3 Sheffield Utd 2 Manchester Utd 1 28070 Bramall Lane
## 4 Arsenal 2 Norwich City 4 24030 Highbury
## 5 Crystal Palace 3 Blackburn 3 17086 Selhurst Park
## 6 Chelsea 1 Oldham Athletic 1 20699 Stamford Bridge
## Referee Notes
## 1 Vic Callow <NA>
## 2 Howard King <NA>
## 3 Brian Hill <NA>
## 4 Alan Gunn <NA>
## 5 Roger Milford <NA>
## 6 Jim Borrett <NA>
## MatchURL
## 1 https://fbref.com/en/matches/0a1cbb88/Southampton-Tottenham-Hotspur-August-15-1992-Premier-League
## 2 https://fbref.com/en/matches/2858b9c3/Coventry-City-Middlesbrough-August-15-1992-Premier-League
## 3 https://fbref.com/en/matches/3e801f77/Sheffield-United-Manchester-United-August-15-1992-Premier-League
## 4 https://fbref.com/en/matches/65bc8b32/Arsenal-Norwich-City-August-15-1992-Premier-League
## 5 https://fbref.com/en/matches/72aacc4b/Crystal-Palace-Blackburn-Rovers-August-15-1992-Premier-League
## 6 https://fbref.com/en/matches/aebac3a1/Chelsea-Oldham-Athletic-August-15-1992-Premier-League
## Home_xG Away_xG
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
names(Data)
## [1] "Competition_Name" "Gender" "Country" "Season_End_Year"
## [5] "Round" "Wk" "Day" "Date"
## [9] "Time" "Home" "HomeGoals" "Away"
## [13] "AwayGoals" "Attendance" "Venue" "Referee"
## [17] "Notes" "MatchURL" "Home_xG" "Away_xG"
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
PL<-Data %>%
arrange(Season_End_Year,Wk)%>%
select(Season_End_Year,Wk,Date,Home,HomeGoals,AwayGoals,Away,Date) %>% mutate(Date=ymd(Date))
This shows that for 2023-2024 Company we are remaining with 131 games
colSums(is.na(PL))
## Season_End_Year Wk Date Home HomeGoals
## 0 0 0 0 131
## AwayGoals Away
## 131 0
We now exclude the un played matches
str(PL)
## 'data.frame': 12406 obs. of 7 variables:
## $ Season_End_Year: int 1993 1993 1993 1993 1993 1993 1993 1993 1993 1993 ...
## $ Wk : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Date : Date, format: "1992-08-15" "1992-08-15" ...
## $ Home : chr "Southampton" "Coventry City" "Sheffield Utd" "Arsenal" ...
## $ HomeGoals : int 0 2 2 2 3 1 2 1 1 1 ...
## $ AwayGoals : int 0 1 1 4 3 1 1 1 1 0 ...
## $ Away : chr "Tottenham" "Middlesbrough" "Manchester Utd" "Norwich City" ...
We now add the final description of home,away and draw(H=Home Win, D=Draw, A=Away Win)
PL$FTR<- case_when(
PL$HomeGoals>PL$AwayGoals~"HOME_WINS",
PL$HomeGoals<PL$AwayGoals~"AWAY_WINS",
PL$HomeGoals==PL$AwayGoals~"DRAWS"
)
Descriptive by summary
summary(PL)
## Season_End_Year Wk Date Home
## Min. :1993 Min. : 1.00 Min. :1992-08-15 Length:12406
## 1st Qu.:2000 1st Qu.:10.00 1st Qu.:2000-01-03 Class :character
## Median :2008 Median :20.00 Median :2008-02-10 Mode :character
## Mean :2008 Mean :19.72 Mean :2008-03-19
## 3rd Qu.:2016 3rd Qu.:29.00 3rd Qu.:2016-04-09
## Max. :2024 Max. :42.00 Max. :2024-05-19
##
## HomeGoals AwayGoals Away FTR
## Min. :0.000 Min. :0.000 Length:12406 Length:12406
## 1st Qu.:1.000 1st Qu.:0.000 Class :character Class :character
## Median :1.000 Median :1.000 Mode :character Mode :character
## Mean :1.529 Mean :1.149
## 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :9.000 Max. :9.000
## NA's :131 NA's :131
We see that week has a maximum of 42 which means games are played within 42 weeks
table(PL$Season_End_Year)
##
## 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
## 462 462 462 380 380 380 380 380 380 380 380 380 380 380 380 380
## 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
## 380 380 380 380 380 380 380 380 380 380 380 380 380 380 380 380
We see that from 1993 to 1995 the number of games that were played were 462 this is because by then the number of teams were 22 and were reduced to 20 in 1996.
#We are going to generate a new dataframe that includes the percentage of different game results, where 'H' denotes a home team win, 'A' denotes an away team win, and 'D' denotes a draw.
Home_vs_Away<-count(PL,FTR) %>% arrange(desc(n))
Home_vs_Away$percentage<-(Home_vs_Away$n/sum((Home_vs_Away$n)))*100
ggplot(data = Home_vs_Away,aes(x=reorder(FTR,-percentage),y=percentage))+
geom_col()+labs(title = "Percentage winning rate")
home_vs_away<-na.omit(Home_vs_Away)
ggplot(data=home_vs_away,mapping = aes(x=FTR,y=percentage))+geom_col()+coord_flip()+labs(title = "Games won at Home and away")
By comparing the win percentages for home and away games, you can quantify the advantage of playing at the home ground. Typically, you would expect the win percentage for home games to be higher than for away games, indicating a home ground advantage.
The NA shows the number of games not played as this data goes up to 2024 company and we still have remaining games.
As We see that the number of games won at Home were greater than the draws and away games.
Now we will find out with what chances or probability do teams have of wining at home and away.
home_vs_away$probability<-(home_vs_away$n/sum((home_vs_away$n)))
home_vs_away
## FTR n percentage probability
## 1 HOME_WINS 5631 45.38933 0.4587373
## 2 AWAY_WINS 3496 28.17991 0.2848065
## 3 DRAWS 3148 25.37482 0.2564562
By calculating the probability using the equation \(P(A) = {n(S) \over n(A)}\) Where:
\(P(A)\) is the probability of an outcome
\(n(S)\) is the number of favorable outcomes for event \(A\).
\(n(S)\) is the total number of possible outcomes in the sample space.
We see that the probability of wining at home is \(0.4587373\) and hence we can conclude that teams have great chances of winning at home than away.
## new dataframe counting final results by every year
point_year<-PL%>%
group_by(Season_End_Year)%>%
count(FTR)
point_year
## # A tibble: 97 × 3
## # Groups: Season_End_Year [32]
## Season_End_Year FTR n
## <int> <chr> <int>
## 1 1993 AWAY_WINS 118
## 2 1993 DRAWS 130
## 3 1993 HOME_WINS 214
## 4 1994 AWAY_WINS 128
## 5 1994 DRAWS 142
## 6 1994 HOME_WINS 192
## 7 1995 AWAY_WINS 123
## 8 1995 DRAWS 134
## 9 1995 HOME_WINS 205
## 10 1996 AWAY_WINS 96
## # ℹ 87 more rows
We will now calculate the number of points by away game and home games
## calculating the points that collected in a home and away ground
point_year$points<-case_when(point_year$FTR=="HOME_WINS"~point_year$n*3,
point_year$FTR=="AWAY_WINS"~point_year$n*3,
point_year$FTR=="DRAWS"~point_year$n*1
)
point_year
## # A tibble: 97 × 4
## # Groups: Season_End_Year [32]
## Season_End_Year FTR n points
## <int> <chr> <int> <dbl>
## 1 1993 AWAY_WINS 118 354
## 2 1993 DRAWS 130 130
## 3 1993 HOME_WINS 214 642
## 4 1994 AWAY_WINS 128 384
## 5 1994 DRAWS 142 142
## 6 1994 HOME_WINS 192 576
## 7 1995 AWAY_WINS 123 369
## 8 1995 DRAWS 134 134
## 9 1995 HOME_WINS 205 615
## 10 1996 AWAY_WINS 96 288
## # ℹ 87 more rows
We will now find the number of points accumulated at home and away in each year
## creating a new column"h_points" that summation points that collected in home ground either "h_points" to away points
point_year2<-na.omit(point_year)%>%
group_by(Season_End_Year)%>%
summarize(home_points=points[FTR=="HOME_WINS"]+points[FTR=="DRAWS"],away_points=(points[FTR=="AWAY_WINS"])+points[FTR=="DRAWS"])
point_year2
## # A tibble: 32 × 3
## Season_End_Year home_points away_points
## <int> <dbl> <dbl>
## 1 1993 772 484
## 2 1994 718 526
## 3 1995 749 503
## 4 1996 656 386
## 5 1997 605 416
## 6 1998 647 398
## 7 1999 622 403
## 8 2000 653 395
## 9 2001 653 386
## 10 2002 596 443
## # ℹ 22 more rows
We will now make the table in to a tidy table
## tidying the data
point_year3<-point_year2%>%
pivot_longer(c(`home_points`, `away_points`), names_to = "Home_vs_Away", values_to = "Points")
point_year3
## # A tibble: 64 × 3
## Season_End_Year Home_vs_Away Points
## <int> <chr> <dbl>
## 1 1993 home_points 772
## 2 1993 away_points 484
## 3 1994 home_points 718
## 4 1994 away_points 526
## 5 1995 home_points 749
## 6 1995 away_points 503
## 7 1996 home_points 656
## 8 1996 away_points 386
## 9 1997 home_points 605
## 10 1997 away_points 416
## # ℹ 54 more rows
ggplot(data = point_year3,aes(x=Season_End_Year,y=Points,col=Home_vs_Away))+
geom_line()+
theme(legend.title=element_blank())+
theme(legend.position=c(0.9,0.9))+
scale_color_manual(labels = c("Away", "Home"),
values = c( "red", "blue"))+
ggtitle("Total points collected by the teams Home vs Away")+
xlab("Season end year")+
ylab("Total points")
We see that in \(2020\) things changed, the away games won are more than the home games. We will try to evaluate and see what cold be the cause of this.
The home winning percetage through the years
## new dataframe counting the Home, Away and Draw through the years
home_vs_away_years<-PL %>%
group_by(Season_End_Year,FTR)%>%
count(FTR)
##the percentage of every case
home_vs_away_years$Season_End_Year<-as.numeric(home_vs_away_years$Season_End_Year)
home_vs_away_years$percentage<- case_when(
home_vs_away_years$Season_End_Year==1993~home_vs_away_years$n*100/462,
home_vs_away_years$Season_End_Year==1994~home_vs_away_years$n*100/462,
home_vs_away_years$Season_End_Year==1995~home_vs_away_years$n*100/462,
TRUE ~ home_vs_away_years$n*100/380
)
home_vs_away_year<-na.omit(home_vs_away_years)
home_vs_away_year
## # A tibble: 96 × 4
## # Groups: Season_End_Year, FTR [96]
## Season_End_Year FTR n percentage
## <dbl> <chr> <int> <dbl>
## 1 1993 AWAY_WINS 118 25.5
## 2 1993 DRAWS 130 28.1
## 3 1993 HOME_WINS 214 46.3
## 4 1994 AWAY_WINS 128 27.7
## 5 1994 DRAWS 142 30.7
## 6 1994 HOME_WINS 192 41.6
## 7 1995 AWAY_WINS 123 26.6
## 8 1995 DRAWS 134 29.0
## 9 1995 HOME_WINS 205 44.4
## 10 1996 AWAY_WINS 96 25.3
## # ℹ 86 more rows
We will now find the year with the highest away win
away_games_year<-home_vs_away_year %>% filter(FTR=="AWAY_WINS")
ggplot(data = away_games_year,mapping = aes(x=Season_End_Year,y=percentage))+geom_line()+labs(title = "Plot of the away games winning percentage" )
We can see that the percentage of wining away games was higher in 2020 , the reason is because in \(2020\) we had Covid-19 so supporters were not allowed to watch football on stadium , everything was virtue . Hence we can conclude that supporters play a pivot role to ensure home win games.
plotting the Home winning percentage through the years
Home_wins<-home_vs_away_year %>% filter(FTR=="HOME_WINS")
ggplot(data = Home_wins,mapping = aes(x=Season_End_Year,y=percentage))+geom_line()+labs(title = "Home winning percentage")
Observation
The home team’s success rate remained consistently above 40% over the years until experiencing a notable decline in 2021, before eventually rebounding to its usual levels. We will revisit this particular scenario later in our analysis to determine its potential influence on scoring goals at the home stadium.
Now we will try to find the effect home field have on the number of goals scored?
home_goals_vs_away_goals<-PL %>%
group_by(Season_End_Year)%>%
summarise(all_home_goals=sum(HomeGoals),all_away_goals=sum(AwayGoals))
we can now make the table tidy
home_goals_vs_away_goals2<-home_goals_vs_away_goals %>% pivot_longer(cols = c('all_home_goals','all_away_goals'),names_to = 'Home_and_away_goals',values_to = 'total_goals')
home_vs_away_goals3<-na.omit(home_goals_vs_away_goals2)
ggplot(data = home_vs_away_goals3,mapping = aes(x=Season_End_Year,y=total_goals,col=Home_and_away_goals,))+geom_line()
The home advantage, evident in both points accumulation and goal-scoring throughout the years, was disrupted notably in \(2021\). Upon closer examination of this period, it became apparent that \(2021\) coincided with the onset of the COVID-19 pandemic, resulting in matches being played without spectators. Delving deeper into this timeframe, data from “premierleague.com” reveals that the last 92 matches of the \(2019-2020\) season were played behind closed doors, and the entirety of the \(2020–21\) season followed suit, with only brief exceptions in December \(2020\) and May \(2021\) when limited numbers of fans were permitted. This prompts us to investigate the impact of crowd absence throughout the \(2020–2021\) season on various aspects of the game.
## new dataframe with only results of 2020-2021 season
season_2021<-subset(PL,Season_End_Year=="2021")
Home vs Away winning in 2021 season
point_2021<-season_2021%>%
group_by(Season_End_Year)%>%
count(FTR)
point_2021
## # A tibble: 3 × 3
## # Groups: Season_End_Year [1]
## Season_End_Year FTR n
## <int> <chr> <int>
## 1 2021 AWAY_WINS 153
## 2 2021 DRAWS 83
## 3 2021 HOME_WINS 144
plotting the result
ggplot(data =point_2021,mapping=aes(x=FTR,y=n))+
geom_col()+coord_flip()+
ggtitle("Home vs Away winning in 2021 season")+
ylab("Numbers of matches")+
xlab("The Results")
The absence of a crowd has evidently nullified the home advantage in sports. Now, shifting our focus to team analysis, we’ll begin by identifying the team with the highest number of victories at their home stadium.
This exploration will provide insights into teams’ strengths and performances in their familiar home environments.
## new dataframe to calculate the home results for each PL team
home_point<-PL %>%
group_by(Home)%>%
count(FTR)
Home_points<-na.omit(home_point)
## replacing "H" by "W" for winning, "A" by "L" for losing and "D" still "D" for draw
Home_points$FTR[Home_points$FTR=="HOME_WINS"]<-"W"
Home_points$FTR[Home_points$FTR=="AWAY_WINS"]<-"L"
Home_points$FTR[Home_points$FTR=="DRAWS"]<-"D"
## calculating the points collected at home ground by each team
## as we know the winning team get three points, the losing team take nothing, and 1 point for each team during draw
Home_points$points<- case_when(Home_points$FTR=="L"~Home_points$n*0,
Home_points$FTR=="W"~Home_points$n*3,
Home_points$FTR=="D"~Home_points$n*1
)
home_point2<-Home_points%>%
group_by(Home)%>%
summarize(T_point=sum(points))
plotting the result of points in home ground
ggplot(data = home_point2,aes(x=T_point,y=reorder(Home,T_point),fill=Home))+
geom_col()+
ggtitle("Points by team in Home ground")+
ylab("Team")+
xlab("Points")
Manchester United stands out as the team with the highest number of points accumulated at their home stadium, followed closely by Arsenal, Liverpool, and Chelsea.
Let’s move to the other side and see the teams’ performances individually in away matches.
## new dataframe to calculate the home results for each PL team
away_point<-PL %>%
group_by(Away)%>%
count(FTR)
Away_point<-na.omit(away_point)
## replacing "H" by "L" for losing, "A" by "W" for winning and "D" still "D" for draw
Away_point$FTR[Away_point$FTR=="HOME_WINS"]<-"L"
Away_point$FTR[Away_point$FTR=="AWAY_WINS"]<-"W"
Away_point$FTR[Away_point$FTR=="DRAWS"]<-"D"
## calculating the points collected at away ground by each team
## as we know the winning team get three points, the losing team take nothing, and 1 point for each team during draw
Away_point$points<- case_when(Away_point$FTR=="L"~Away_point$n*0,
Away_point$FTR=="W"~Away_point$n*3,
Away_point$FTR=="D"~Away_point$n*1
)
away_point2<-Away_point%>%
group_by(Away)%>%
summarize(T_point=sum(points))
plotting the result of away point accumulated
ggplot(data = away_point2,aes(x=T_point,y=reorder(Away,T_point),fill=Away))+
geom_col()+
ggtitle("Points by team in Away ground")+
ylab("Team")+
xlab("Points")
Manchester United holds the top position in terms of points collected at their home stadium, with Chelsea, Arsenal, and Liverpool following closely behind.
We will now try to find What is the best way of winning matches, is it defensive or attacking.
In this section, we’ll evaluate different techniques for collecting points based on their offensive and defensive capabilities, primarily through goals scored and goals received. We’ll analyze the number of goals scored and the average points collected by the team, as well as the number of goals conceded and the average points collected.
## to see how many Home teams scored by end of the match
table(PL$HomeGoals)
##
## 0 1 2 3 4 5 6 7 8 9
## 2865 3972 2973 1503 635 219 69 28 7 4
## to see how many Away teams scored by end of the match
table(PL$AwayGoals)
##
## 0 1 2 3 4 5 6 7 8 9
## 4165 4250 2354 1044 336 90 30 3 2 1
We see that both scores are in the range of 0 to 9
PL_r<-na.omit(PL)
## creating a matrix with 4 variables
## variable for goals the team scored in a match "Goal_scored" and next to it the average point they got "points_s",
## and variable for goals the team received in a match "Goal_received" and next to it the average point they got "points_A".
## and our observations the range from 0 to 9
matrix<-matrix(nrow = 10,ncol = 4)
colnames(matrix)=c("Goal_scored","points_s","Goal_received","points_A")
matrix[,1]<-c(0:9)
matrix[,3]<-c(0:9)
## creating a loop to calculate the average point when scoring a goals in a range from 0 to 9
for(i in 0:9)
{
## in Home ground
home_score_i<-subset(PL_r,PL_r$HomeGoals==i)
home_score_i$points_h_i<-case_when(home_score_i$FTR=="HOME_WINS"~3,
home_score_i$FTR=="DRAWS"~1,
home_score_i$FTR=="AWAY_WINS"~0
)
all_points_gains_at_home_when_scored_i<-sum(home_score_i$points_h_i)
## in away grounds
away_score_i<-subset(PL_r,PL_r$AwayGoals==i)
away_score_i$points_a_i<-case_when(away_score_i$FTR=="HOME_WINS"~0,
away_score_i$FTR=="DRAWS"~1,
away_score_i$FTR=="AWAY_WINS"~3
)
all_points_gains_at_away_when_scored_i<-sum(away_score_i$points_a_i)
all_points_gains_when_scored_i<-all_points_gains_at_away_when_scored_i+all_points_gains_at_home_when_scored_i
average_points_when_scored_i<-all_points_gains_when_scored_i/(nrow(home_score_i)+nrow(away_score_i))
matrix[i+1,2]<-(average_points_when_scored_i)
}
## creating a loop to calculate the average point when receiving a goals in a range from 0 to 9
for(j in 0:9)
{
## in Home ground
home_against_j<-subset(PL_r,PL_r$AwayGoals==j)
home_against_j$points_h_j<-case_when(home_against_j$FTR=="HOME_WINS"~3,
home_against_j$FTR=="DRAWS"~1,
home_against_j$FTR=="AWAY_WINS"~0
)
all_points_gains_at_home_when_against_j<-sum(home_against_j$points_h_j)
## in Away grounds
away_against_j<-subset(PL_r,PL_r$HomeGoals==j)
away_against_j$points_a_j<-case_when(away_against_j$FTR=="HOME_WINS"~0,
away_against_j$FTR=="DRAWS"~1,
away_against_j$FTR=="AWAY_WINS"~3
)
all_points_gains_at_away_when_against_j<-sum(away_against_j$points_a_j)
all_points_gains_when_against_j<-all_points_gains_at_away_when_against_j+all_points_gains_at_home_when_against_j
average_points_when_against_j<-all_points_gains_when_against_j/(nrow(home_against_j)+nrow(away_against_j))
matrix[j+1,4]<-(average_points_when_against_j)
}
goals_points<-as.data.frame(matrix)
head(goals_points,10)
## Goal_scored points_s Goal_received points_A
## 1 0 0.2805121 0 2.438975818
## 2 1 1.1432741 1 1.513986865
## 3 2 2.1334710 2 0.639384269
## 4 3 2.6556733 3 0.242245779
## 5 4 2.9052523 4 0.059732235
## 6 5 2.9870550 5 0.006472492
## 7 6 3.0000000 6 0.000000000
## 8 7 3.0000000 7 0.000000000
## 9 8 3.0000000 8 0.000000000
## 10 9 3.0000000 9 0.000000000
## tidying and plotting the data
goals_points2<-goals_points%>%
pivot_longer(c(`points_s`, `points_A`), names_to = "scored_vs_against", values_to = "Points")
ggplot(data = goals_points2,aes(x=Goal_scored,y=Points,col=scored_vs_against))+
geom_line()+
theme(legend.title=element_blank())+
theme(legend.position=c(0.05,0.94))+
scale_color_manual(labels = c("against", "scored"),
values = c( "red", "blue"))+
ggtitle("average points when the teams scored vs received a goals")+
xlab("Goals")+
ylab("points")
The outcome is somewhat unexpected, as it reveals that, on average, a team that keeps a clean sheet gathers 2.43 points, surpassing the average of 2.13 points collected by a team scoring two goals. Additionally, a team conceding only one goal accumulates more points on average than a team scoring just one goal.
## creating a list that will contain every PL table
PL_Table<-list()
## creating a loop to deal with the data year by year
for(i in 1993:2022)
{
## creating subset contain every year results
PL_r_i<-subset(PL_r,PL_r$Season_End_Year==i)
## count of home goals, scored and against
PL_r_hgi<-PL_r_i%>%
group_by(Home)%>%
summarize(goal_scored_at_home=sum(HomeGoals),
goal_against_at_home=sum(AwayGoals))
## count of away goals, scored and against
PL_r_agi<-PL_r_i%>%
group_by(Away)%>%
summarize(goal_scored_at_away=sum(AwayGoals),
goal_against_at_away=sum(HomeGoals))
##home_and_away_goals
goals_i<-cbind(PL_r_hgi,PL_r_agi)
goals_i$GS<-(goals_i$goal_scored_at_away)+(goals_i$goal_scored_at_home)
goals_i$GA<-(goals_i$goal_against_at_away)+(goals_i$goal_against_at_home)
goals_i$GD<-(goals_i$GS)-(goals_i$GA)
goals2_i<-goals_i%>%
select(Home,GS,GA,GD)
##home result
PL_r_hri<-PL_r_i%>%
group_by(Home)%>%
count(FTR)
PL_r_hri$W<-case_when(PL_r_hri$FTR=="HOME_WINS"~PL_r_hri$n*1,
PL_r_hri$FTR=="DRAWS"~0,
PL_r_hri$FTR=="AWAY_WINS"~0
)
PL_r_hri$D<-case_when(PL_r_hri$FTR=="HOME_WINS"~0,
PL_r_hri$FTR=="DRAWS"~PL_r_hri$n*1,
PL_r_hri$FTR=="AWAY_WINS"~0
)
PL_r_hri$L<-case_when(PL_r_hri$FTR=="HOME_WINS"~0,
PL_r_hri$FTR=="DRAWS"~0,
PL_r_hri$FTR=="AWAY_WINS"~PL_r_hri$n*1
)
PL_r2_hri<-PL_r_hri%>%
group_by(Home)%>%
summarize(Wh=sum(W),Dh=sum(D),Lh=sum(L))
PL_r2_hri$hpoints<-(PL_r2_hri$Wh*3)+(PL_r2_hri$Dh*1)
##away result
PL_r_ari<-PL_r_i%>%
group_by(Away)%>%
count(FTR)
PL_r_ari$W<-case_when(PL_r_ari$FTR=="HOME_WINS"~0,
PL_r_ari$FTR=="DRAWS"~0,
PL_r_ari$FTR=="AWAY_WINS"~PL_r_ari$n*1
)
PL_r_ari$D<-case_when(PL_r_ari$FTR=="HOME_WINS"~0,
PL_r_ari$FTR=="DRAWS"~PL_r_ari$n*1,
PL_r_ari$FTR=="AWAY_WINS"~0
)
PL_r_ari$L<-case_when(PL_r_ari$FTR=="HOME_WINS"~PL_r_ari$n*1,
PL_r_ari$FTR=="DRAWS"~0,
PL_r_ari$FTR=="AWAY_WINS"~0
)
PL_r2_ari<-PL_r_ari%>%
group_by(Away)%>%
summarize(Wa=sum(W),Da=sum(D),La=sum(L))
PL_r2_ari$apoints<-(PL_r2_ari$Wa*3)+(PL_r2_ari$Da*1)
##home and away points
points_i<-cbind(PL_r2_ari,PL_r2_hri)
points_i$W<-(points_i$Wa)+(points_i$Wh)
points_i$D<-(points_i$Da)+(points_i$Dh)
points_i$L<-(points_i$La)+(points_i$Lh)
points_i$points<-(points_i$apoints)+(points_i$hpoints)
points2_i<-points_i%>%
select(Away,W,D,L,points)
Table_i<-cbind(goals2_i,points2_i)
Table2_i<-Table_i%>%
select(Home,W,D,L,GS,GA,GD,points)
Table3_i<-arrange(Table2_i,desc(points),desc(GD),desc(GS))
names(Table3_i)[names(Table3_i) == 'Home'] <- 'Team'
Table3_i$Season<-paste0(i-1,"/",i)
Table3_i$Rank<-1:case_when(i==1993~22,
i==1994~22,
i==1995~22,
TRUE ~ 20)
Table3_i<-Table3_i[,c(9,10,1,2,3,4,5,6,7,8)]
PL_Table[[i]]<-Table3_i
}
testing the result
PL_Table[[2022]]
## Season Rank Team W D L GS GA GD points
## 1 2021/2022 1 Manchester City 29 6 3 99 26 73 93
## 2 2021/2022 2 Liverpool 28 8 2 94 26 68 92
## 3 2021/2022 3 Chelsea 21 11 6 76 33 43 74
## 4 2021/2022 4 Tottenham 22 5 11 69 40 29 71
## 5 2021/2022 5 Arsenal 22 3 13 61 48 13 69
## 6 2021/2022 6 Manchester Utd 16 10 12 57 57 0 58
## 7 2021/2022 7 West Ham 16 8 14 60 51 9 56
## 8 2021/2022 8 Leicester City 14 10 14 62 59 3 52
## 9 2021/2022 9 Brighton 12 15 11 42 44 -2 51
## 10 2021/2022 10 Wolves 15 6 17 38 43 -5 51
## 11 2021/2022 11 Newcastle Utd 13 10 15 44 62 -18 49
## 12 2021/2022 12 Crystal Palace 11 15 12 50 46 4 48
## 13 2021/2022 13 Brentford 13 7 18 48 56 -8 46
## 14 2021/2022 14 Aston Villa 13 6 19 52 54 -2 45
## 15 2021/2022 15 Southampton 9 13 16 43 67 -24 40
## 16 2021/2022 16 Everton 11 6 21 43 66 -23 39
## 17 2021/2022 17 Leeds United 9 11 18 42 79 -37 38
## 18 2021/2022 18 Burnley 7 14 17 34 53 -19 35
## 19 2021/2022 19 Watford 6 5 27 34 77 -43 23
## 20 2021/2022 20 Norwich City 5 7 26 23 84 -61 22
How many points collecting by the winner of the PL through the years
winner_points<-matrix(nrow = 30,ncol = 2)
colnames(winner_points)=c("Season_End_Year","winner_points")
winner_points<-as.data.frame(winner_points)
winner_points$Season_End_Year<-1993:2022
for (i in 1993:2022)
{
wi<-PL_Table[[i]]$points[PL_Table[[i]]$Rank==1]
winner_points[i-1992,2]<-wi
}
plotting the result
ggplot(data=winner_points)+
aes(x=Season_End_Year,y=winner_points)+
geom_line()+
ggtitle("points of winners of the PL")+
xlab("Season End Year")+
ylab("Points")
From this we see that in order for a Team to Win EPL it should accumulate at least 75 points.
winners_of_pl<-matrix(nrow = 30,ncol = 2)
winners_of_pl<-as.data.frame(winners_of_pl)
names(winners_of_pl)=c("Season_End_Year","winner")
for(i in 1993:2022)
{
winners_of_pl[i-1992,1]<-paste0(i)
winner_i<-PL_Table[[i]]$Team[PL_Table[[i]]$Rank==1]
winners_of_pl[i-1992,2]<-winner_i
}
n_winners_of_pl<-winners_of_pl%>%
group_by(winner)%>%
count(winner)%>%
arrange(desc(n))
## plotting the result
ggplot(data = n_winners_of_pl,aes(x=n,y=reorder(winner,n),fill=winner))+
geom_col()+
ggtitle("Winners of PL")+
ylab("Team")+
xlab("Numbers of winning times")+
theme(legend.position="none")
It’s clear that Manchester United has been the dominant force in the Premier League, having won numerous titles and trophies, with their last triumph being a decade ago. Upon investigation, it became apparent that a significant factor during their dominance was the coaching of Sir Alex Ferguson. This prompted us to explore the differences between Manchester United under Ferguson’s management and their performance post-Ferguson era. Additionally, we delved into analyzing coaches in the Premier League, focusing on those who have won the title twice or more, in an attempt to determine the best coach in the league’s history.
We now move to our final analysis of Who is the best coach in premier league history.
We will start analysis each manage with the club they managed , among the manager we will analyse the following Sir Alex Ferguson, Pep Guardiola ,Jürgen Klopp and José Mourinho .
We will start with Sir Alex ferguson of Manchester united
It’s evident that Manchester United has been the most successful club in terms of titles and trophies in the Premier League, with their last triumph occurring a decade ago. During our research, we noted that a key factor in Manchester United’s dominance was the coaching tenure of Sir Alex Ferguson. This prompted us to explore the disparities between Manchester United under Ferguson’s management and their performance post-Ferguson era. Furthermore, we sought to investigate the coaches in the Premier League, specifically focusing on those who have won the title twice or more, in order to determine who could be considered the best coach in the league’s history.
## new subset with home results of Manchester United during the period of Sir Alex Ferguson
sir_home<-subset(PL_r,Home=="Manchester Utd"&Season_End_Year<=2013)%>%
arrange(Season_End_Year,Wk)
head(sir_home)
## Season_End_Year Wk Date Home HomeGoals AwayGoals
## 1 1993 2 1992-08-19 Manchester Utd 0 3
## 2 1993 3 1992-08-22 Manchester Utd 1 1
## 3 1993 6 1992-09-02 Manchester Utd 1 0
## 4 1993 7 1992-09-06 Manchester Utd 2 0
## 5 1993 10 1992-09-26 Manchester Utd 0 0
## 6 1993 12 1992-10-18 Manchester Utd 2 2
## Away FTR
## 1 Everton AWAY_WINS
## 2 Ipswich Town DRAWS
## 3 Crystal Palace HOME_WINS
## 4 Leeds United HOME_WINS
## 5 QPR DRAWS
## 6 Liverpool DRAWS
## new subset with away results of Manchester United during the period of Sir Alex Ferguson
sir_away<-subset(PL_r,Away=="Manchester Utd"&Season_End_Year<=2013)%>%
arrange(Season_End_Year,Wk)
head(sir_away)
## Season_End_Year Wk Date Home HomeGoals AwayGoals
## 1 1993 1 1992-08-15 Sheffield Utd 2 1
## 2 1993 4 1992-08-24 Southampton 0 1
## 3 1993 5 1992-08-29 Nott'ham Forest 0 2
## 4 1993 8 1992-09-12 Everton 0 2
## 5 1993 9 1992-09-19 Tottenham 1 1
## 6 1993 11 1992-10-03 Middlesbrough 1 1
## Away FTR
## 1 Manchester Utd HOME_WINS
## 2 Manchester Utd AWAY_WINS
## 3 Manchester Utd AWAY_WINS
## 4 Manchester Utd AWAY_WINS
## 5 Manchester Utd DRAWS
## 6 Manchester Utd DRAWS
##sir points at home
sir_home_point<-sir_home%>%
group_by(FTR)%>%
count(FTR)
sir_home_point$points<-case_when(sir_home_point$FTR=="HOME_WINS"~sir_home_point$n*3,
sir_home_point$FTR=="DRAWS"~sir_home_point$n*1,
sir_home_point$FTR=="AWAY_WINS"~sir_home_point$n*0
)
head(sir_home_point)
## # A tibble: 3 × 3
## # Groups: FTR [3]
## FTR n points
## <chr> <int> <dbl>
## 1 AWAY_WINS 34 0
## 2 DRAWS 66 66
## 3 HOME_WINS 305 915
## sir points away home ground
sir_away_point<-sir_away%>%
group_by(FTR)%>%
count(FTR)
sir_away_point$points<-case_when(sir_away_point$FTR=="HOME_WINS"~sir_away_point$n*0,
sir_away_point$FTR=="DRAWS"~sir_away_point$n*1,
sir_away_point$FTR=="AWAY_WINS"~sir_away_point$n*3
)
head(sir_away_point)
## # A tibble: 3 × 3
## # Groups: FTR [3]
## FTR n points
## <chr> <int> <dbl>
## 1 AWAY_WINS 223 669
## 2 DRAWS 102 102
## 3 HOME_WINS 80 0
## sir goals
##home
sir_goals_home<-sir_home%>%
group_by(Home)%>%
summarize(all_home_goal=sum(sir_home$HomeGoals),all_home_goal_against=sum(sir_home$AwayGoals))
head(sir_goals_home)
## # A tibble: 1 × 3
## Home all_home_goal all_home_goal_against
## <chr> <int> <int>
## 1 Manchester Utd 910 270
##away
sir_goals_away<-sir_away%>%
group_by(Away)%>%
summarize(all_away_goal=sum(sir_away$AwayGoals),all_away_goal_against=sum(sir_away$HomeGoals))
head(sir_goals_away)
## # A tibble: 1 × 3
## Away all_away_goal all_away_goal_against
## <chr> <int> <int>
## 1 Manchester Utd 717 433
Let now see the wins of sir alex at home
sir_home_point %>% ggplot(mapping = aes(x=FTR,y=n))+geom_col()+ggtitle("Sir Alex home games")
Lets now calculate the probability of sir Alex wins ,loses and draws at home
sir_home_point$Probabity<-sir_home_point$n/405
sir_home_point
## # A tibble: 3 × 4
## # Groups: FTR [3]
## FTR n points Probabity
## <chr> <int> <dbl> <dbl>
## 1 AWAY_WINS 34 0 0.0840
## 2 DRAWS 66 66 0.163
## 3 HOME_WINS 305 915 0.753
We see that Sir Alex had \(0.75\) of wining at home ,\(0.16\) drawing and \(0.083\) losing
Let’s now see how sir performed away games
sir_away_point %>% ggplot(mapping = aes(x=FTR,y=n))+geom_col()+ggtitle("The number of games sir won at home")
Lets now calculate sir away wining probability
sir_away_point$Probability <-sir_away_point$n/405
sir_away_point
## # A tibble: 3 × 4
## # Groups: FTR [3]
## FTR n points Probability
## <chr> <int> <dbl> <dbl>
## 1 AWAY_WINS 223 669 0.551
## 2 DRAWS 102 102 0.252
## 3 HOME_WINS 80 0 0.198
We sir that sir Alex away wining probability was \(0.55\) drawing \(0.25\) Losing \(0.197\)
We will now move to Pep Guardiola of Machester City
Pep Guardiola commenced his tenure as Manchester City’s coach at the onset of the 2017 season and has continued in the role up to the present day.
## new subset with home results of Manchester City during the period of Pep Guardiola
pep_home<-subset(PL_r,Home=="Manchester City"&Season_End_Year>=2017)%>%
arrange(Season_End_Year,Wk)
head(pep_home)
## Season_End_Year Wk Date Home HomeGoals AwayGoals
## 1 2017 1 2016-08-13 Manchester City 2 1
## 2 2017 3 2016-08-28 Manchester City 3 1
## 3 2017 5 2016-09-17 Manchester City 4 0
## 4 2017 8 2016-10-15 Manchester City 1 1
## 5 2017 9 2016-10-23 Manchester City 1 1
## 6 2017 11 2016-11-05 Manchester City 1 1
## Away FTR
## 1 Sunderland HOME_WINS
## 2 West Ham HOME_WINS
## 3 Bournemouth HOME_WINS
## 4 Everton DRAWS
## 5 Southampton DRAWS
## 6 Middlesbrough DRAWS
## new subset with away results of Manchester City during the period of Pep Guardiola
pep_away<-subset(PL_r,Away=="Manchester City"&Season_End_Year>=2017)%>%
arrange(Season_End_Year,Wk)
head(pep_away)
## Season_End_Year Wk Date Home HomeGoals AwayGoals
## 1 2017 2 2016-08-20 Stoke City 1 4
## 2 2017 4 2016-09-10 Manchester Utd 1 2
## 3 2017 6 2016-09-24 Swansea City 1 3
## 4 2017 7 2016-10-02 Tottenham 2 0
## 5 2017 10 2016-10-29 West Brom 0 4
## 6 2017 12 2016-11-19 Crystal Palace 1 2
## Away FTR
## 1 Manchester City AWAY_WINS
## 2 Manchester City AWAY_WINS
## 3 Manchester City AWAY_WINS
## 4 Manchester City HOME_WINS
## 5 Manchester City AWAY_WINS
## 6 Manchester City AWAY_WINS
## Pep points at home ground
pep_home_point<-pep_home%>%
group_by(FTR)%>%
count(FTR)
pep_home_point$points<-case_when(pep_home_point$FTR=="HOME_WINS"~pep_home_point$n*3,
pep_home_point$FTR=="DRAWS"~pep_home_point$n*1,
pep_home_point$FTR=="AWAY_WINS"~pep_home_point$n*0
)
head(pep_home_point)
## # A tibble: 3 × 3
## # Groups: FTR [3]
## FTR n points
## <chr> <int> <dbl>
## 1 AWAY_WINS 12 0
## 2 DRAWS 20 20
## 3 HOME_WINS 114 342
## Pep points at away grounds
pep_away_point<-pep_away%>%
group_by(FTR)%>%
count(FTR)
pep_away_point$points<-case_when(pep_away_point$FTR=="AWAY_WINS"~pep_away_point$n*3,
pep_away_point$FTR=="HOME_WINS"~pep_away_point$n*0,
pep_away_point$FTR=="DRAWS"~pep_away_point$n*1
)
head(pep_away_point)
## # A tibble: 3 × 3
## # Groups: FTR [3]
## FTR n points
## <chr> <int> <dbl>
## 1 AWAY_WINS 100 300
## 2 DRAWS 19 19
## 3 HOME_WINS 26 0
## pep goals:
##home
pep_goals_home<-pep_home%>%
group_by(Home)%>%
summarize(all_home_goal=sum(pep_home$HomeGoals),all_home_goal_against=sum(pep_home$AwayGoals))
head(pep_goals_home)
## # A tibble: 1 × 3
## Home all_home_goal all_home_goal_against
## <chr> <int> <int>
## 1 Manchester City 404 116
##away
pep_goals_away<-pep_away%>%
group_by(Away)%>%
summarize(all_away_goal=sum(pep_away$AwayGoals),all_away_goal_against=sum(pep_away$HomeGoals))
head(pep_goals_home)
## # A tibble: 1 × 3
## Home all_home_goal all_home_goal_against
## <chr> <int> <int>
## 1 Manchester City 404 116ggplot(data=pep_home_point,mapping = aes(x=FTR,y=n))+geom_col()+labs(title = " Pep Guardiola's number of wins draws and loses at Home")
From this chart we see that pep has 12 Home loses , 20 draws and 114 Home wins . We can now calculate Pep Home winning , losing and drawing probability .
By calculating the probability using the equation \(P(A) = {n(S) \over n(A)}\) Where:
\(P(A)\) is the probability of an outcome
\(n(S)\) is the number of favorable outcomes for event \(A\).
\(n(S)\) is the total number of possible outcomes in the sample space.
pep_home_point$Probability <- pep_home_point$n/146
pep_home_point
## # A tibble: 3 × 4
## # Groups: FTR [3]
## FTR n points Probability
## <chr> <int> <dbl> <dbl>
## 1 AWAY_WINS 12 0 0.0822
## 2 DRAWS 20 20 0.137
## 3 HOME_WINS 114 342 0.781
We see from the probability that pep has \(0.08\) of losing a home game , \(0.14\) drawing at home and \(0.78\) probability of wining at home.
We move to pep probability of away games
pep_away_point %>% ggplot(mapping = aes(x=FTR,y=n))+geom_col()+labs(title = "Pep Guardiola's number of away wins , loses and draws")
We will now calculate his home wining probability.
pep_away_point$probablity<- pep_away_point$n/145
pep_away_point
## # A tibble: 3 × 4
## # Groups: FTR [3]
## FTR n points probablity
## <chr> <int> <dbl> <dbl>
## 1 AWAY_WINS 100 300 0.690
## 2 DRAWS 19 19 0.131
## 3 HOME_WINS 26 0 0.179
We see that Pep probability of wining away games is\(0.689\), drawing \(0.131\) and losing of \(0.18\).
We now move to Liverpool coach Jürgen Klopp
Klopp took over as manager of Liverpool in 2015 and has since led the team to significant success. Under his guidance, Liverpool reached the UEFA Champions League finals in 2018 and 2022, securing victory in 2019, their sixth title in the competition. In the 2018-19 Premier League season, Liverpool finished second with 97 points, the third-highest total in English top division history. Klopp’s team then won the UEFA Super Cup and FIFA Club World Cup in the following season. In 2019-20, Klopp led Liverpool to their first Premier League title, setting a club record of 99 points and breaking various top-flight records. His achievements earned him consecutive FIFA Coach of the Year awards in 2019 and 2020
## new subset with home results of Liverpool during the period of Klopp
klopp_home<-subset(PL_r,Home=="Liverpool"&Season_End_Year>=2015)%>%
arrange(Season_End_Year,Wk)
head(klopp_home)
## Season_End_Year Wk Date Home HomeGoals AwayGoals Away
## 1 2015 1 2014-08-17 Liverpool 2 1 Southampton
## 2 2015 4 2014-09-13 Liverpool 0 1 Aston Villa
## 3 2015 6 2014-09-27 Liverpool 1 1 Everton
## 4 2015 7 2014-10-04 Liverpool 2 1 West Brom
## 5 2015 9 2014-10-25 Liverpool 0 0 Hull City
## 6 2015 11 2014-11-08 Liverpool 1 2 Chelsea
## FTR
## 1 HOME_WINS
## 2 AWAY_WINS
## 3 DRAWS
## 4 HOME_WINS
## 5 DRAWS
## 6 AWAY_WINS
## new subset with away results of Manchester City during the period of klopp
klopp_away<-subset(PL_r,Away=="Manchester City"&Season_End_Year>=2015)%>%
arrange(Season_End_Year,Wk)
head(klopp_away)
## Season_End_Year Wk Date Home HomeGoals AwayGoals
## 1 2015 1 2014-08-17 Newcastle Utd 0 2
## 2 2015 4 2014-09-13 Arsenal 2 2
## 3 2015 6 2014-09-27 Hull City 2 4
## 4 2015 7 2014-10-04 Aston Villa 0 2
## 5 2015 9 2014-10-25 West Ham 2 1
## 6 2015 11 2014-11-08 QPR 2 2
## Away FTR
## 1 Manchester City AWAY_WINS
## 2 Manchester City DRAWS
## 3 Manchester City AWAY_WINS
## 4 Manchester City AWAY_WINS
## 5 Manchester City HOME_WINS
## 6 Manchester City DRAWS
## Pep points at home ground
klopp_home_point<-klopp_home%>%
group_by(FTR)%>%
count(FTR)
klopp_home_point$points<-case_when(klopp_home_point$FTR=="HOME_WINS"~klopp_home_point$n*3,
klopp_home_point$FTR=="DRAWS"~klopp_home_point$n*1,
klopp_home_point$FTR=="AWAY_WINS"~klopp_home_point$n*0
)
head(klopp_home_point)
## # A tibble: 3 × 3
## # Groups: FTR [3]
## FTR n points
## <chr> <int> <dbl>
## 1 AWAY_WINS 16 0
## 2 DRAWS 42 42
## 3 HOME_WINS 125 375
## Pep points at away grounds
klopp_away_point<-klopp_away%>%
group_by(FTR)%>%
count(FTR)
klopp_away_point$points<-case_when(klopp_away_point$FTR=="AWAY_WINS"~klopp_away_point$n*3,
klopp_away_point$FTR=="HOME_WINS"~klopp_away_point$n*0,
klopp_away_point$FTR=="DRAWS"~klopp_away_point$n*1
)
head(klopp_away_point)
## # A tibble: 3 × 3
## # Groups: FTR [3]
## FTR n points
## <chr> <int> <dbl>
## 1 AWAY_WINS 117 351
## 2 DRAWS 30 30
## 3 HOME_WINS 36 0
## pep goals:
##home
klopp_goals_home<-klopp_home%>%
group_by(Home)%>%
summarize(all_home_goal=sum(klopp_home$HomeGoals),all_home_goal_against=sum(klopp_home$AwayGoals))
head(klopp_goals_home)
## # A tibble: 1 × 3
## Home all_home_goal all_home_goal_against
## <chr> <int> <int>
## 1 Liverpool 417 152
##away
klopp_goals_away<-klopp_away%>%
group_by(Away)%>%
summarize(all_away_goal=sum(klopp_away$AwayGoals),all_away_goal_against=sum(klopp_away$HomeGoals))
head(klopp_goals_home)
## # A tibble: 1 × 3
## Home all_home_goal all_home_goal_against
## <chr> <int> <int>
## 1 Liverpool 417 152
We no use the visual to the number of wins draws and losses for Klopp
klopp_home_point %>% ggplot(mapping=aes(x=FTR,y=n))+geom_col()+ggtitle("Klopps games won,drawed and lost at Home")+xlab("Full time home Resulty")+ylab("Number of home games")
Lets now calculate klopps probablity of wining home , Draw and losing at home
klopp_home_point$Probability <- klopp_home_point$n/183
klopp_home_point
## # A tibble: 3 × 4
## # Groups: FTR [3]
## FTR n points Probability
## <chr> <int> <dbl> <dbl>
## 1 AWAY_WINS 16 0 0.0874
## 2 DRAWS 42 42 0.230
## 3 HOME_WINS 125 375 0.683
We see that Klopp has 0.68 chances of winning home ,0.23 of drawing and 0.0874 of losing
Let now move to away games for klopp
klopp_away_point %>% ggplot(mapping = aes(x=FTR,y=n))+geom_col()+ggtitle("Klopps aways games")+xlab("Full time away Resulty")+ylab("Number of away games")
Lets now calculate the probability of klopp wining away games
klopp_away_point$Probabilty<-klopp_away_point$n/183
klopp_away_point
## # A tibble: 3 × 4
## # Groups: FTR [3]
## FTR n points Probabilty
## <chr> <int> <dbl> <dbl>
## 1 AWAY_WINS 117 351 0.639
## 2 DRAWS 30 30 0.164
## 3 HOME_WINS 36 0 0.197
We see that Klopp away wining is \(0.64\), Draws is \(0.16\),losing \(0.197\) away games .
We now move to Chelsea manger José Mourinho.
Jose Mourinho became the manager of Chelsea twice, from 2004 to 2007 and then again from 2013 to 2015. His first unveiling at Stamford Bridge on June 2, 2004, marked the arrival of the most accomplished manager in Chelsea’s history. At just 41 years old, the Portuguese tactician had already secured major trophies including the European Cup, UEFA Cup, and domestic league title in his native Portugal, showcasing his remarkable success and pedigree in football management.
## new subset with home results of Chelsea during the first period of Jose Mourinho
jose1_home<-subset(PL_r,Home=="Chelsea"& Date>="2004-08-14"&Date<="2007-09-15")%>%
arrange(Season_End_Year,Wk)
head(jose1_home)
## Season_End_Year Wk Date Home HomeGoals AwayGoals Away
## 1 2005 1 2004-08-15 Chelsea 1 0 Manchester Utd
## 2 2005 4 2004-08-28 Chelsea 2 1 Southampton
## 3 2005 6 2004-09-19 Chelsea 0 0 Tottenham
## 4 2005 8 2004-10-03 Chelsea 1 0 Liverpool
## 5 2005 10 2004-10-23 Chelsea 4 0 Blackburn
## 6 2005 12 2004-11-06 Chelsea 1 0 Everton
## FTR
## 1 HOME_WINS
## 2 HOME_WINS
## 3 DRAWS
## 4 HOME_WINS
## 5 HOME_WINS
## 6 HOME_WINS
## new subset with away results of Chelsea during the first period of Jose Mourinho
jose1_away<-subset(PL_r,Away=="Chelsea"& Date>="2004-08-14"&Date<="2007-09-15")%>%
arrange(Season_End_Year,Wk)
head(jose1_away)
## Season_End_Year Wk Date Home HomeGoals AwayGoals Away
## 1 2005 2 2004-08-21 Birmingham City 0 1 Chelsea
## 2 2005 3 2004-08-24 Crystal Palace 0 2 Chelsea
## 3 2005 5 2004-09-11 Aston Villa 0 0 Chelsea
## 4 2005 7 2004-09-25 Middlesbrough 0 1 Chelsea
## 5 2005 9 2004-10-16 Manchester City 1 0 Chelsea
## 6 2005 11 2004-10-30 West Brom 1 4 Chelsea
## FTR
## 1 AWAY_WINS
## 2 AWAY_WINS
## 3 DRAWS
## 4 AWAY_WINS
## 5 HOME_WINS
## 6 AWAY_WINS
##jose points at home ground
jose1_home_point<-jose1_home%>%
group_by(FTR)%>%
count(FTR)
jose1_home_point$points<-case_when(jose1_home_point$FTR=="HOME_WINS"~jose1_home_point$n*3,
jose1_home_point$FTR=="DRAWS"~jose1_home_point$n*1,
jose1_home_point$FTR=="AWAY_WINS"~jose1_home_point$n*0
)
head(jose1_home_point)
## # A tibble: 2 × 3
## # Groups: FTR [2]
## FTR n points
## <chr> <int> <dbl>
## 1 DRAWS 14 14
## 2 HOME_WINS 46 138
##jose points at away grounds
jose1_away_point<-jose1_away%>%
group_by(FTR)%>%
count(FTR)
jose1_away_point$points<-case_when(jose1_away_point$FTR=="AWAY_WINS"~jose1_away_point$n*3,
jose1_away_point$FTR=="HOME_WINS"~jose1_away_point$n*0,
jose1_away_point$FTR=="DRAWS"~jose1_away_point$n*1
)
jose1_away_point
## # A tibble: 3 × 3
## # Groups: FTR [3]
## FTR n points
## <chr> <int> <dbl>
## 1 AWAY_WINS 39 117
## 2 DRAWS 11 11
## 3 HOME_WINS 10 0
## jose goals:
##home
jose1_goals_home<-jose1_home%>%
group_by(Home)%>%
summarize(all_home_goal=sum(jose1_home$HomeGoals),all_home_goal_against=sum(jose1_home$AwayGoals))
head(jose1_goals_home)
## # A tibble: 1 × 3
## Home all_home_goal all_home_goal_against
## <chr> <int> <int>
## 1 Chelsea 123 28
##away
jose1_goals_away<-jose1_away%>%
group_by(Away)%>%
summarize(all_away_goal=sum(jose1_away$AwayGoals),all_away_goal_against=sum(jose1_away$HomeGoals))
head(jose1_goals_away)
## # A tibble: 1 × 3
## Away all_away_goal all_away_goal_against
## <chr> <int> <int>
## 1 Chelsea 92 39
We now plot and see the number of wins on home games by Jose at home
jose1_home_point %>% ggplot(mapping = aes(x=FTR,y=n))+geom_col()+ggtitle("Jose Home games")+xlab("Full time away Resulty")+ylab("Number of away games")
Lets now calculate the probability of Jose wining home games
jose1_home_point$Probabity<-jose1_home_point$n/60
jose1_home_point
## # A tibble: 2 × 4
## # Groups: FTR [2]
## FTR n points Probabity
## <chr> <int> <dbl> <dbl>
## 1 DRAWS 14 14 0.233
## 2 HOME_WINS 46 138 0.767
We see from the probability that jose had \(0.766\) wining home games and \(0.233\) of drawing
Let now move to away games
jose1_away_point %>% ggplot(mapping = aes(x=FTR,y=n))+geom_col()+ggtitle("Jose away games")
Let now calculate the probability of jose’s away gamws
jose1_away_point$Probability<-jose1_away_point$n/60
jose1_away_point
## # A tibble: 3 × 4
## # Groups: FTR [3]
## FTR n points Probability
## <chr> <int> <dbl> <dbl>
## 1 AWAY_WINS 39 117 0.65
## 2 DRAWS 11 11 0.183
## 3 HOME_WINS 10 0 0.167
We see that jose has \(0.65\) Chances of wining aways games and \(0.167\) of losing .
Conclusion on the best coach based on the probability of wining home and away.
By comparing the probability of the above Coaches we see that pep has the highest chance of winning home games which is \(0.78\) followed by Jose with the probability of \(0.767\) Thus we can conclude that on home games pep is the best coach . lets now consider away game , and see the best performing coach for away games .We see that for away games pep has the highest probability which is \(0.689\) followed by klopp with the probability of \(0.683\). And we can conclude that Pep is the best coach in EPL.
# Creating a tibble
my_table <- tibble(Coach_name =c("Sir_Alex","Pep","Klopp","Jose"),Prob_of_wining_away_games= c(0.55,0.689,0.683,0.65),prob_of_home_games=c(0.75,0.781,0.68,0.767) )
my_table
## # A tibble: 4 × 3
## Coach_name Prob_of_wining_away_games prob_of_home_games
## <chr> <dbl> <dbl>
## 1 Sir_Alex 0.55 0.75
## 2 Pep 0.689 0.781
## 3 Klopp 0.683 0.68
## 4 Jose 0.65 0.767
my_table$sum_of_prob<-my_table$Prob_of_wining_away_games+my_table$prob_of_home_games
my_table
## # A tibble: 4 × 4
## Coach_name Prob_of_wining_away_games prob_of_home_games sum_of_prob
## <chr> <dbl> <dbl> <dbl>
## 1 Sir_Alex 0.55 0.75 1.3
## 2 Pep 0.689 0.781 1.47
## 3 Klopp 0.683 0.68 1.36
## 4 Jose 0.65 0.767 1.42
my_table %>% ggplot(mapping = aes(x=Coach_name,y=sum_of_prob))+geom_col()+coord_flip()+ggtitle("The best Coach in the Premier Legue ")