Introduction

Data taken from Club Soccer Predictions on 08/27/2021 which forecasts and generates soccer power ratings for clubs in 39 leagues around the world. The project’s model evaluates the performance of each soccer club and assigns them three attributes; the number of goals scored against an average team, the number of goals allowed against an average team, and an overall soccer performance index (spi) which represents the percentage of points the team is expected to take. The project updates their data models after each match is played to refine the teams attributes. Additionally the project uses the club’s attributes to then forecast the the probabilities of teams winning their league, earning a spot in the Champion’s league, getting relegated, as well as other outcomes.

Load data

Load csv data from github into a data frame.

club_rankings_csv <- 'https://raw.githubusercontent.com/dab31415/DATA607/main/W1_spi_global_rankings.csv'
club_matches_csv <- 'https://raw.githubusercontent.com/dab31415/DATA607/main/W1_spi_matches.csv'

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
club_rankings_df <- read_csv(club_rankings_csv)
## 
## -- Column specification --------------------------------------------------------
## cols(
##   rank = col_double(),
##   prev_rank = col_double(),
##   name = col_character(),
##   league = col_character(),
##   off = col_double(),
##   def = col_double(),
##   spi = col_double()
## )
club_matches_df <- read_csv(club_matches_csv)
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   date = col_date(format = ""),
##   league = col_character(),
##   team1 = col_character(),
##   team2 = col_character()
## )
## i Use `spec()` for the full column specifications.

Filtering

The original data source contains the teams from 39 leagues. Filter the dataset down to the twenty teams in the Barclays Premier League.

pl_rankings_df <- filter(club_rankings_df,league == 'Barclays Premier League')

# Add league_rank Column
pl_rankings_df <- mutate(pl_rankings_df, league_rank = min_rank(desc(spi)))

# Select columns
(pl_rankings_df <- select(pl_rankings_df,league_rank,name,off,def,spi))
## # A tibble: 20 x 5
##    league_rank name                       off   def   spi
##          <int> <chr>                    <dbl> <dbl> <dbl>
##  1           1 Manchester City           2.76  0.25  92.2
##  2           2 Chelsea                   2.46  0.27  89.6
##  3           3 Liverpool                 2.64  0.41  88.7
##  4           4 Manchester United         2.43  0.46  85.7
##  5           5 Tottenham Hotspur         2.26  0.69  78.8
##  6           6 Arsenal                   2.07  0.61  78.0
##  7           7 West Ham United           2.18  0.79  75.6
##  8           8 Leicester City            2.05  0.72  75.2
##  9           9 Brighton and Hove Albion  1.9   0.63  74.8
## 10          10 Everton                   1.96  0.71  74.0
## 11          11 Aston Villa               1.99  0.81  72.1
## 12          12 Wolverhampton             1.8   0.69  71.5
## 13          13 Leeds United              1.99  0.87  70.6
## 14          14 Southampton               1.81  0.95  65.3
## 15          15 Newcastle                 1.84  1.01  64.5
## 16          16 Brentford                 1.66  0.87  64.3
## 17          17 Burnley                   1.76  0.98  63.7
## 18          18 Crystal Palace            1.64  0.94  62.2
## 19          19 Watford                   1.61  0.98  60.5
## 20          20 Norwich City              1.58  1     59.6

Match Projections

View the projections from the model for matches 8/24-8/30 in the Barclays Premier League.

pl_matches_df <- filter(club_matches_df,league == 'Barclays Premier League',season == 2021)
# rename date column to match_date
names(pl_matches_df)[names(pl_matches_df) == 'date'] <- 'match_date'
pl_matches_df <- mutate(pl_matches_df,match_week = floor_date(match_date,'week',2))

# Add match_week starting on Tuesdays
(pl_thisweek = select(filter(pl_matches_df,match_week == mdy('08/24/2021')),match_date,team1,team2,proj_score1,proj_score2))
## # A tibble: 10 x 5
##    match_date team1                    team2             proj_score1 proj_score2
##    <date>     <chr>                    <chr>                   <dbl>       <dbl>
##  1 2021-08-28 Manchester City          Arsenal                  2.13        0.74
##  2 2021-08-28 Norwich City             Leicester City           1.08        1.54
##  3 2021-08-28 West Ham United          Crystal Palace           1.89        0.93
##  4 2021-08-28 Brighton and Hove Albion Everton                  1.37        1.1 
##  5 2021-08-28 Aston Villa              Brentford                1.59        0.99
##  6 2021-08-28 Newcastle                Southampton              1.53        1.31
##  7 2021-08-28 Liverpool                Chelsea                  1.51        1.3 
##  8 2021-08-29 Tottenham Hotspur        Watford                  2.04        0.81
##  9 2021-08-29 Burnley                  Leeds United             1.39        1.46
## 10 2021-08-29 Wolverhampton            Manchester United        0.96        1.7

Findings and Recommendations

In the 2015-16 season, Leicester City F.C. shook the world winning the Premier League. No one could have predicted it prior to the start of the season. How accurate is this model, at what point in the 2015 season, would it have predicted Leicester City would be atop the league at the end of the season. The model has been revised over several seasons to attempt to be more accurate. At this point, I’m not certain how to complete such an analysis, but will be looking forward to following this model for future league seasons.