Summary :

The article “Club Soccer Prediction” is a forecasting model of club soccer around the world. This article uses a revised version of ESPN’s SPI (Soccer Power Index) rating from seasons 2016 to 2022. The article shows the probability of a team winning, losing, or drawing their respective games.

Link to the articles:

https://projects.fivethirtyeight.com/soccer-predictions/

https://fivethirtyeight.com/methodology/how-our-club-soccer-predictions-work/

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# added csv file to my github
data <- read.csv("https://raw.githubusercontent.com/Nick-Climaco/Rdataset/main/spi_matches.csv")
head(data)
##   season       date league_id                  league            team1
## 1   2016 2016-07-09      7921 FA Women's Super League  Liverpool Women
## 2   2016 2016-07-10      7921 FA Women's Super League    Arsenal Women
## 3   2016 2016-07-10      7921 FA Women's Super League Chelsea FC Women
## 4   2016 2016-07-16      7921 FA Women's Super League  Liverpool Women
## 5   2016 2016-07-17      7921 FA Women's Super League Chelsea FC Women
## 6   2016 2016-07-24      7921 FA Women's Super League          Reading
##                 team2  spi1  spi2  prob1  prob2 probtie proj_score1 proj_score2
## 1             Reading 51.56 50.42 0.4389 0.2767  0.2844        1.39        1.05
## 2 Notts County Ladies 46.61 54.03 0.3572 0.3608  0.2819        1.27        1.28
## 3     Birmingham City 59.85 54.64 0.4799 0.2487  0.2714        1.53        1.03
## 4 Notts County Ladies 53.00 52.35 0.4289 0.2699  0.3013        1.27        0.94
## 5       Arsenal Women 59.43 60.99 0.4124 0.3157  0.2719        1.45        1.24
## 6     Birmingham City 50.75 55.03 0.3821 0.3200  0.2979        1.22        1.09
##   importance1 importance2 score1 score2 xg1 xg2 nsxg1 nsxg2 adj_score1
## 1          NA          NA      2      0  NA  NA    NA    NA         NA
## 2          NA          NA      2      0  NA  NA    NA    NA         NA
## 3          NA          NA      1      1  NA  NA    NA    NA         NA
## 4          NA          NA      0      0  NA  NA    NA    NA         NA
## 5          NA          NA      1      2  NA  NA    NA    NA         NA
## 6          NA          NA      1      1  NA  NA    NA    NA         NA
##   adj_score2
## 1         NA
## 2         NA
## 3         NA
## 4         NA
## 5         NA
## 6         NA
# create a subset from data where the probability of a tie is greater than the probability of either team winning. 
df <- data %>% 
    select(season, league, team1, team2, spi1, spi2, prob1, prob2, probtie, xg1, xg2) %>% 
    rename(expected_goal1 = "xg1", expected_goal2 = "xg2") %>%
    filter(probtie > prob1 & probtie > prob2)
head(df)
##   season                   league                team1         team2  spi1
## 1   2017     Russian Premier Liga               FC Ufa  Terek Grozny 54.31
## 2   2017     Russian Premier Liga           Amkar Perm        FC Ufa 44.64
## 3   2017 Spanish Segunda Division                 Lugo Reus Deportiu 37.15
## 4   2017            Italy Serie B               Spezia         Carpi 32.88
## 5   2017 Spanish Segunda Division        Reus Deportiu      Numancia 40.25
## 6   2017            Italy Serie B F.B.C Unione Venezia        Spezia 27.68
##    spi2  prob1  prob2 probtie expected_goal1 expected_goal2
## 1 61.04 0.3545 0.2820  0.3635             NA             NA
## 2 56.06 0.3022 0.3319  0.3659             NA             NA
## 3 44.19 0.3712 0.2286  0.4002             NA             NA
## 4 39.27 0.3256 0.2775  0.3969             NA             NA
## 5 41.33 0.3830 0.2070  0.4100             NA             NA
## 6 31.82 0.3562 0.2789  0.3650             NA             NA

Conclusion and Findings :

A very interesting fact based on the data provided, we observe how low the chances are where a tie game is the most probable outcome. For further analysis, we could analyze how SPI for each team can impact their likelihood of a match regardless if they are the home or away team. It also be interesting to cross-examine the teams with the best SPI across different leagues and even different regions of the world.