The Guardian publish a weekly set of questions and answers on a variety of football minutiae at The Knowledge. Forutnately, some of these are extremely tractable using R, so I thought I’d have a go at working through the archives to see if I can shed light on any of the questions.
library(rvest)
library(dplyr)
library(magrittr)
library(data.table)
library(zoo)
library(ggplot2)
#jalapic/engsoccerdata
library(engsoccerdata)
#take all of the english soccer data in the package and bind it together
england_data <- bind_rows(
select(engsoccerdata::england, .data$home, .data$visitor, date = .data$Date),
select(engsoccerdata::englandplayoffs, .data$home, .data$visitor, date = .data$Date),
select(engsoccerdata::england1939, .data$home, .data$visitor, date = .data$Date)) %>%
setDT() %>%
#convert the date to date class
.[, date := as.Date(date)]
#get a list of each unique team in the dataset
all_teams <- unique(c(as.character(england_data$home),
as.character(england_data$visitor)))
#melt the dataset by each teams matches
england_data_long <- rbindlist(lapply(all_teams, function(team) {
england_data %>%
.[home == team | visitor == team] %>%
.[, matching_team := team]
})) %>%
.[home == matching_team, other := visitor] %>%
.[visitor == matching_team, other := home] %>%
.[, c("date", "matching_team", "other")] %>%
#get the suffixes and prefixes of the other team
.[, other_prefix := gsub(" .*", "", other)] %>%
.[, other_suffix := gsub(".* ", "", other)] %>%
#arrange by team and date
.[order(matching_team, date)] %>%
#convert to an id
.[, suffix_id := as.numeric(as.factor(other_suffix))] %>%
#if playing consecutively against the same suffix id (ignoring prefixes for now) put in same 'chain'
.[, match := suffix_id - lead(suffix_id), by = "matching_team"] %>%
.[match == 0 & lead(match) != 0, chain_id := 1:.N] %>%
.[match == 0] %>%
.[, chain_id := na.locf(chain_id, fromLast = TRUE)] %>%
.[, chain_length := .N, by = chain_id] %>%
#take only chains at least as long as Tranmere's run (6)
.[chain_length > 5] %>%
.[order(chain_length)] %>%
.[, c("date", "matching_team", "other", "chain_length")]
#print the chains of equal length to Tranmere's run
print(england_data_long)
## date matching_team other chain_length
## 1: 1950-12-30 Chesterfield Leicester City 6
## 2: 1951-01-13 Chesterfield Manchester City 6
## 3: 1951-01-20 Chesterfield Coventry City 6
## 4: 1951-02-03 Chesterfield Cardiff City 6
## 5: 1951-02-17 Chesterfield Birmingham City 6
## 6: 1951-02-24 Chesterfield Swansea City 6
## 7: 2009-03-21 Leicester City Colchester United 6
## 8: 2009-03-28 Leicester City Peterborough United 6
## 9: 2009-04-04 Leicester City Carlisle United 6
## 10: 2009-04-11 Leicester City Hereford United 6
## 11: 2009-04-13 Leicester City Leeds United 6
## 12: 2009-04-18 Leicester City Southend United 6
## 13: 1921-05-02 Fulham Hull City 7
## 14: 1921-05-07 Fulham Hull City 7
## 15: 1921-08-27 Fulham Coventry City 7
## 16: 1921-08-29 Fulham Leicester City 7
## 17: 1921-09-03 Fulham Coventry City 7
## 18: 1921-09-05 Fulham Leicester City 7
## 19: 1921-09-10 Fulham Hull City 7
## 20: 1920-04-17 Leyton Orient Birmingham City 7
## 21: 1920-04-24 Leyton Orient Birmingham City 7
## 22: 1920-04-26 Leyton Orient Leicester City 7
## 23: 1920-05-01 Leyton Orient Leicester City 7
## 24: 1920-08-28 Leyton Orient Leicester City 7
## 25: 1920-08-30 Leyton Orient Cardiff City 7
## 26: 1920-09-04 Leyton Orient Leicester City 7
## 27: 1920-10-09 Notts County Stoke City 7
## 28: 1920-10-16 Notts County Stoke City 7
## 29: 1920-10-23 Notts County Cardiff City 7
## 30: 1920-10-30 Notts County Cardiff City 7
## 31: 1920-11-06 Notts County Coventry City 7
## 32: 1920-11-13 Notts County Coventry City 7
## 33: 1920-11-20 Notts County Leicester City 7
## date matching_team other chain_length
It seems super weird that a longer chain would only have happened thrice in quick succession within(ish) one season!
It’s possible this is correct and the way fixturing worked in 1920-1921 there just happened to be a lot of these chains but it seems weird. Is there any raw data the package is built from the check the validity of these fixtures?