The Guardian publish a weekly set of questions and answers on a variety of football minutiae at The Knowledge. Forutnately, some of these are extremely tractable using R, so I thought I’d have a go at working through the archives to see if I can shed light on any of the questions.

library(rvest)
library(dplyr)
library(magrittr)
library(data.table)
library(zoo)
library(ggplot2)

#jalapic/engsoccerdata
library(engsoccerdata)

Tranmere vs. Towns

‘This season, Tranmere Rovers return to contest League Two alongside eight teams with the suffix Town, including six successive fixtures against these clubs over the New Year. What is the record for successive fixtures versus clubs with the same (or no) prefix or suffix?’

#take all of the english soccer data in the package and bind it together
england_data <- bind_rows(
    select(engsoccerdata::england, .data$home, .data$visitor, date = .data$Date),
    select(engsoccerdata::englandplayoffs, .data$home, .data$visitor, date = .data$Date),
    select(engsoccerdata::england1939, .data$home, .data$visitor, date = .data$Date)) %>%
  setDT() %>%
  #convert the date to date class
  .[, date := as.Date(date)]

#get a list of each unique team in the dataset
all_teams <- unique(c(as.character(england_data$home),
                      as.character(england_data$visitor)))

#melt the dataset by each teams matches
england_data_long <- rbindlist(lapply(all_teams, function(team) {
  england_data %>%
    .[home == team | visitor == team] %>%
    .[, matching_team := team]
  })) %>%
  .[home == matching_team, other := visitor] %>%
  .[visitor == matching_team, other := home] %>%
  .[, c("date", "matching_team", "other")] %>%
  #get the suffixes and prefixes of the other team
  .[, other_prefix := gsub(" .*", "", other)] %>%
  .[, other_suffix := gsub(".* ", "", other)] %>%
  #arrange by team and date
  .[order(matching_team, date)] %>%
  #convert to an id
  .[, suffix_id := as.numeric(as.factor(other_suffix))] %>%
  #if playing consecutively against the same suffix id (ignoring prefixes for now) put in same 'chain'
  .[, match := suffix_id - lead(suffix_id), by = "matching_team"] %>%
  .[match == 0 & lead(match) != 0, chain_id := 1:.N] %>%
  .[match == 0] %>%
  .[, chain_id := na.locf(chain_id, fromLast = TRUE)] %>%
  .[, chain_length := .N, by = chain_id] %>%
  #take only chains at least as long as Tranmere's run (6)
  .[chain_length > 5] %>%
  .[order(chain_length)] %>%
  .[, c("date", "matching_team", "other", "chain_length")]

#print the chains of equal length to Tranmere's run
print(england_data_long)
##           date  matching_team               other chain_length
##  1: 1950-12-30   Chesterfield      Leicester City            6
##  2: 1951-01-13   Chesterfield     Manchester City            6
##  3: 1951-01-20   Chesterfield       Coventry City            6
##  4: 1951-02-03   Chesterfield        Cardiff City            6
##  5: 1951-02-17   Chesterfield     Birmingham City            6
##  6: 1951-02-24   Chesterfield        Swansea City            6
##  7: 2009-03-21 Leicester City   Colchester United            6
##  8: 2009-03-28 Leicester City Peterborough United            6
##  9: 2009-04-04 Leicester City     Carlisle United            6
## 10: 2009-04-11 Leicester City     Hereford United            6
## 11: 2009-04-13 Leicester City        Leeds United            6
## 12: 2009-04-18 Leicester City     Southend United            6
## 13: 1921-05-02         Fulham           Hull City            7
## 14: 1921-05-07         Fulham           Hull City            7
## 15: 1921-08-27         Fulham       Coventry City            7
## 16: 1921-08-29         Fulham      Leicester City            7
## 17: 1921-09-03         Fulham       Coventry City            7
## 18: 1921-09-05         Fulham      Leicester City            7
## 19: 1921-09-10         Fulham           Hull City            7
## 20: 1920-04-17  Leyton Orient     Birmingham City            7
## 21: 1920-04-24  Leyton Orient     Birmingham City            7
## 22: 1920-04-26  Leyton Orient      Leicester City            7
## 23: 1920-05-01  Leyton Orient      Leicester City            7
## 24: 1920-08-28  Leyton Orient      Leicester City            7
## 25: 1920-08-30  Leyton Orient        Cardiff City            7
## 26: 1920-09-04  Leyton Orient      Leicester City            7
## 27: 1920-10-09   Notts County          Stoke City            7
## 28: 1920-10-16   Notts County          Stoke City            7
## 29: 1920-10-23   Notts County        Cardiff City            7
## 30: 1920-10-30   Notts County        Cardiff City            7
## 31: 1920-11-06   Notts County       Coventry City            7
## 32: 1920-11-13   Notts County       Coventry City            7
## 33: 1920-11-20   Notts County      Leicester City            7
##           date  matching_team               other chain_length

It seems super weird that a longer chain would only have happened thrice in quick succession within(ish) one season!

It’s possible this is correct and the way fixturing worked in 1920-1921 there just happened to be a lot of these chains but it seems weird. Is there any raw data the package is built from the check the validity of these fixtures?