I select the SPI Rating for Football club Soccer data set. SPI defined as Soccer Power Index which is a rating system designed to rank Soccer Clubs’ overall strength. In addition, this is rating system that also use to designate the best team status based on their offensive and attacking strength. This data set contains 641 club teams ranking from 1(the best) to 641(the worst) with their average offensive and defensive rate per game and the team overall psi rating. The link to retrieve the data is below: <https://projects.fivethirtyeight.com/soccer-api/club/spi_global_rankings.csv.
You can also embed plots, for example:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(kableExtra)
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
library(gt)
club <- read.csv("https://projects.fivethirtyeight.com/soccer-api/club/spi_global_rankings.csv", sep = ',',
stringsAsFactors = F)
head(club,10)
## rank prev_rank name league off def
## 1 1 1 Manchester City Barclays Premier League 2.79 0.28
## 2 2 2 Bayern Munich German Bundesliga 3.04 0.68
## 3 3 3 Barcelona Spanish Primera Division 2.45 0.43
## 4 4 4 Real Madrid Spanish Primera Division 2.56 0.60
## 5 5 5 Liverpool Barclays Premier League 2.63 0.67
## 6 6 6 Arsenal Barclays Premier League 2.53 0.61
## 7 7 7 Newcastle Barclays Premier League 2.38 0.53
## 8 8 8 Napoli Italy Serie A 2.30 0.51
## 9 9 9 Borussia Dortmund German Bundesliga 2.83 0.84
## 10 10 10 Brighton and Hove Albion Barclays Premier League 2.47 0.73
## spi
## 1 92.00
## 2 87.66
## 3 86.40
## 4 84.41
## 5 83.93
## 6 83.92
## 7 83.70
## 8 83.25
## 9 82.91
## 10 80.88
# Understanding the variables in the data set
summary(club)
## rank prev_rank name league
## Min. : 1 Min. : 1 Length:641 Length:641
## 1st Qu.:161 1st Qu.:161 Class :character Class :character
## Median :321 Median :321 Mode :character Mode :character
## Mean :321 Mean :321
## 3rd Qu.:481 3rd Qu.:481
## Max. :641 Max. :641
## off def spi
## Min. :0.200 Min. :0.280 Min. : 4.86
## 1st Qu.:0.850 1st Qu.:1.180 1st Qu.:26.68
## Median :1.180 Median :1.460 Median :38.88
## Mean :1.213 Mean :1.479 Mean :40.27
## 3rd Qu.:1.530 3rd Qu.:1.760 3rd Qu.:52.11
## Max. :3.040 Max. :2.860 Max. :92.00
Lets remove the columns we don’t need and rename some of them so it would be easier to understand the data set.
We save the data in our Github and reload as instructed.
club <- read.csv("https://raw.githubusercontent.com/joewarner89/CUNY-607/main/homeworks/Assignment%201/data/spi_global_rankings.csv",
stringsAsFactors = F, header = T, sep = ',')
club <- club %>% select(-contains("prev_rank"))
head(club)
## rank name league off def spi
## 1 1 Manchester City Barclays Premier League 2.79 0.28 92.00
## 2 2 Bayern Munich German Bundesliga 3.04 0.68 87.66
## 3 3 Barcelona Spanish Primera Division 2.45 0.43 86.40
## 4 4 Real Madrid Spanish Primera Division 2.56 0.60 84.41
## 5 5 Liverpool Barclays Premier League 2.63 0.67 83.93
## 6 6 Arsenal Barclays Premier League 2.53 0.61 83.92
club <- club %>% rename(club_team = name,
offensive_rate = off,
defensive_rate = def
)
club$power_class <- as.factor(ifelse(club$spi>= .01 & club$spi <= 29.99, 'Worst Rating Team',
ifelse(club$spi >= 30 & club$spi <= 39.99, 'Average Team',
ifelse(club$spi >= 40 & club$spi <= 75.99 , 'Good Team',
ifelse(club$spi >=76 & club$spi <= 82.99, 'Potential World Class Team',
ifelse(club$spi >=83 & club$spi <= 100, 'World Class Team', 'Unknown'))))))
head(club,10)
## rank club_team league offensive_rate
## 1 1 Manchester City Barclays Premier League 2.79
## 2 2 Bayern Munich German Bundesliga 3.04
## 3 3 Barcelona Spanish Primera Division 2.45
## 4 4 Real Madrid Spanish Primera Division 2.56
## 5 5 Liverpool Barclays Premier League 2.63
## 6 6 Arsenal Barclays Premier League 2.53
## 7 7 Newcastle Barclays Premier League 2.38
## 8 8 Napoli Italy Serie A 2.30
## 9 9 Borussia Dortmund German Bundesliga 2.83
## 10 10 Brighton and Hove Albion Barclays Premier League 2.47
## defensive_rate spi power_class
## 1 0.28 92.00 World Class Team
## 2 0.68 87.66 World Class Team
## 3 0.43 86.40 World Class Team
## 4 0.60 84.41 World Class Team
## 5 0.67 83.93 World Class Team
## 6 0.61 83.92 World Class Team
## 7 0.53 83.70 World Class Team
## 8 0.51 83.25 World Class Team
## 9 0.84 82.91 Potential World Class Team
## 10 0.73 80.88 Potential World Class Team
the variable spi determine the power class of the team. The higher the spi rate the better is the team. Manchester City rank # 1 because it has the highest spi 92.00
The top ten team of 2022-2023 season :
# select only top 20 teams based on ordered spi in the data set
top_20 <- head(club,20)
top_20 %>%
ggplot( aes(x=club_team, y=spi) ) +
geom_bar(stat="identity", fill="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("Top 20 Teams Playing World Class Football") + ggtitle("Highest Rated Soccer Clubs")+
ylab("Soccer Power Index(Best Team for 2022-2023 season)")
Lets see the 20 teams with the lowest rating
worst_20 <- tail(club,20)
worst_20 %>%
ggplot( aes(x=club_team, y=spi) ) +
geom_bar(stat="identity", fill="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("Top 20 Teams Playing Poorly") + ggtitle("Lowest Rated Soccer Club")+
ylab("Soccer Power Index(Best Team for 2022-2023 season)")
During the 2022-2023 season, a lot team improved their goal per game ratio and reduced conceding goals.SPI data set has the overall estimate for All top Soccer Clubs in the world. Lets explore the relationships between the variables.
# best offensive teams and defensive teams
off <-
club %>% arrange(desc(offensive_rate)) %>% head(10) %>% select(club_team, offensive_rate)
# Top 10 Offensive Team
gt(off) %>%
tab_header(
title = "Best Offensive Team for the 2022-2023 Season",
subtitle = "Highest scoring Team for 2022-2023 Season "
)
Best Offensive Team for the 2022-2023 Season | |
Highest scoring Team for 2022-2023 Season | |
club_team | offensive_rate |
---|---|
Bayern Munich | 3.04 |
Borussia Dortmund | 2.83 |
Manchester City | 2.79 |
Ajax | 2.66 |
Liverpool | 2.63 |
Paris Saint-Germain | 2.62 |
Real Madrid | 2.56 |
Arsenal | 2.53 |
Celtic | 2.53 |
Brighton and Hove Albion | 2.47 |
# Best Defense in Europe
deff <-
club %>% arrange(defensive_rate) %>% head(10) %>% select(club_team, defensive_rate)
# Top 10 Defensive Team
gt(deff) %>%
tab_header(
title = "Best Defensive Team for the 2022-2023 Season",
subtitle = "Highest scoring Team for 2022-2023 Season "
)
Best Defensive Team for the 2022-2023 Season | |
Highest scoring Team for 2022-2023 Season | |
club_team | defensive_rate |
---|---|
Manchester City | 0.28 |
Barcelona | 0.43 |
Napoli | 0.51 |
Aston Villa | 0.51 |
Real Sociedad | 0.52 |
Newcastle | 0.53 |
Athletic Bilbao | 0.59 |
Real Madrid | 0.60 |
Arsenal | 0.61 |
Crystal Palace | 0.62 |
Lets look at the relationship between these variables.
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
# creating only numerical variables
corr <- club %>% select(rank,offensive_rate,defensive_rate,spi)
corPlot(corr[,1:4], main = "Correlation of Team Statistic")
corres <- cor(corr)
corres <- round(corres, 2)
# Transform the correlation table to data frame before using gt pckg
gt(data.frame(round(corres,2))) %>%
tab_header(
title = "Correlation Of All the Features for the 2022-2023 Season",
subtitle = " Relationship of All Soccer Statistics"
)
Correlation Of All the Features for the 2022-2023 Season | |||
Relationship of All Soccer Statistics | |||
rank | offensive_rate | defensive_rate | spi |
---|---|---|---|
1.00 | -0.94 | 0.90 | -0.99 |
-0.94 | 1.00 | -0.77 | 0.96 |
0.90 | -0.77 | 1.00 | -0.91 |
-0.99 | 0.96 | -0.91 | 1.00 |
What we learn from the data is that a team cannot win a tournament without a good defense and defensive rate is highly correlated with the ranking number 1. Manchester City conceived few goals than any other teams in Europe. They have won UEFA Champion League, Premier League, FA Cup, Community Shield and EUFA Super Cup. Soccer Power Index represents the team’s overall strength over 100. SPI is a mixture of both defensive and offensive ratings. The team with the highest SPI would occupy the rank 1 as best team in the world.