The article “Club Soccer Prediction” is a forecasting model of club soccer around the world. This article uses a revised version of ESPN’s SPI (Soccer Power Index) rating from seasons 2016 to 2022. The article shows the probability of a team winning, losing, or drawing their respective games.
Link to the articles:
https://projects.fivethirtyeight.com/soccer-predictions/
https://fivethirtyeight.com/methodology/how-our-club-soccer-predictions-work/
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# added csv file to my github
data <- read.csv("https://raw.githubusercontent.com/Nick-Climaco/Rdataset/main/spi_matches.csv")
head(data)
## season date league_id league team1
## 1 2016 2016-07-09 7921 FA Women's Super League Liverpool Women
## 2 2016 2016-07-10 7921 FA Women's Super League Arsenal Women
## 3 2016 2016-07-10 7921 FA Women's Super League Chelsea FC Women
## 4 2016 2016-07-16 7921 FA Women's Super League Liverpool Women
## 5 2016 2016-07-17 7921 FA Women's Super League Chelsea FC Women
## 6 2016 2016-07-24 7921 FA Women's Super League Reading
## team2 spi1 spi2 prob1 prob2 probtie proj_score1 proj_score2
## 1 Reading 51.56 50.42 0.4389 0.2767 0.2844 1.39 1.05
## 2 Notts County Ladies 46.61 54.03 0.3572 0.3608 0.2819 1.27 1.28
## 3 Birmingham City 59.85 54.64 0.4799 0.2487 0.2714 1.53 1.03
## 4 Notts County Ladies 53.00 52.35 0.4289 0.2699 0.3013 1.27 0.94
## 5 Arsenal Women 59.43 60.99 0.4124 0.3157 0.2719 1.45 1.24
## 6 Birmingham City 50.75 55.03 0.3821 0.3200 0.2979 1.22 1.09
## importance1 importance2 score1 score2 xg1 xg2 nsxg1 nsxg2 adj_score1
## 1 NA NA 2 0 NA NA NA NA NA
## 2 NA NA 2 0 NA NA NA NA NA
## 3 NA NA 1 1 NA NA NA NA NA
## 4 NA NA 0 0 NA NA NA NA NA
## 5 NA NA 1 2 NA NA NA NA NA
## 6 NA NA 1 1 NA NA NA NA NA
## adj_score2
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 NA
## 6 NA
# create a subset from data where the probability of a tie is greater than the probability of either team winning.
df <- data %>%
select(season, league, team1, team2, spi1, spi2, prob1, prob2, probtie, xg1, xg2) %>%
rename(expected_goal1 = "xg1", expected_goal2 = "xg2") %>%
filter(probtie > prob1 & probtie > prob2)
head(df)
## season league team1 team2 spi1
## 1 2017 Russian Premier Liga FC Ufa Terek Grozny 54.31
## 2 2017 Russian Premier Liga Amkar Perm FC Ufa 44.64
## 3 2017 Spanish Segunda Division Lugo Reus Deportiu 37.15
## 4 2017 Italy Serie B Spezia Carpi 32.88
## 5 2017 Spanish Segunda Division Reus Deportiu Numancia 40.25
## 6 2017 Italy Serie B F.B.C Unione Venezia Spezia 27.68
## spi2 prob1 prob2 probtie expected_goal1 expected_goal2
## 1 61.04 0.3545 0.2820 0.3635 NA NA
## 2 56.06 0.3022 0.3319 0.3659 NA NA
## 3 44.19 0.3712 0.2286 0.4002 NA NA
## 4 39.27 0.3256 0.2775 0.3969 NA NA
## 5 41.33 0.3830 0.2070 0.4100 NA NA
## 6 31.82 0.3562 0.2789 0.3650 NA NA
A very interesting fact based on the data provided, we observe how low the chances are where a tie game is the most probable outcome. For further analysis, we could analyze how SPI for each team can impact their likelihood of a match regardless if they are the home or away team. It also be interesting to cross-examine the teams with the best SPI across different leagues and even different regions of the world.