email: jc3181 AT columbia DOT edu

twitter: jalapic

 

Introduction

I heard on this week’s Football Weekly Podcast a question posed by John Ashdown -

 

What is the highest scoring result that has repeated itself at least 3 times in a row between the same two sides ?

 

This is the sort of pointless trivia that my engsoccerdata R package was destined to answer. This package contains every result of every single English soccergame (in the top 4 tiers) ever… as well as FA Cup games, La Liga, Bundesliga, Serie A, Eredvisie etc. etc.

 

If you don’t want to know ‘how’ I did it, and are just interested in the games…. skip to the tables at the bottom…

 

Code

I’m not going to explain this in too much detail here right now, I’m happy to if someone would like me to do. Essentially what it does is to assign every match between two teams a unique ID, and then to work out the lengths of sequences of games with particular scorelines. I’m assuming for these purposes that e.g a 0-6 away win is equivalent to a 6-0 home win. We’re just interested in the overall score. I then only keep the highest scoring sequences.

### Highest score to have repeated itself 3 times in a row between two teams...

library(engsoccerdata)
library(dplyr)
library(magrittr)

head(engsoccerdata2)

temp <- engsoccerdata2 %>% 
  arrange(Date) %>%
  mutate(matchid = paste(pmin(home, visitor), pmax(home, visitor)),
         FT1 = paste(pmax(hgoal, vgoal), pmin(hgoal, vgoal))) %$%
  split(., matchid) 



getseqs <- function(x){
  data.frame(lengths = rle(x$FT1)$lengths,
           vals = rle(x$FT1)$values
           ) %>%
  filter(lengths >=3)
}

temp1 <- lapply(temp, getseqs) # get sequences of >=3

temp2 <- Filter(function(x) dim(x)[1] > 0, temp1) #get rid of empty dfs (i.e. matches where no seq of >=3)

temp3 <- Map(cbind, temp2, matchid = names(temp2)) #add name/matchid column

mydf <- do.call("rbind", temp3)

rownames(mydf)<-NULL #easier to read if get rid of rownames

head(mydf)

library(tidyr)

mydf %<>% separate(vals, c("g1", "g2"))

mydf$g1 <- as.numeric(mydf$g1)
mydf$g2 <- as.numeric(mydf$g2)

mydf %>%
  mutate(totgoals = g1+g2) %>%
  filter(totgoals >=5) %>%
  arrange(desc(totgoals))

 

This gives the following games and scores. The column “lengths” refers to how many games in a row a particular scoreline occurred between those two teams - as you can see, all these scorelines of at least 6 goals occurred 3 times in a row:

lengths result match total goals
3 5-2 Bristol Rovers v Walsall 7
3 5-1 Arsenal v Manchester United 6
3 4-2 Burnley v Wolverhampton Wanderers 6
3 3-3 Everton v Nottingham Forest 6
3 4-2 Leeds United v Rotherham United 6
3 5-1 Manchester City v Tottenham Hotspur 6
3 3-3 Sheffield Wednesday v Sunderland 6

 

Here are the dates of these games, where at least one team scored 5:

date home visitor FT divsion
1937-03-13 Walsall Bristol Rovers 5-2 3S
1937-12-25 Bristol Rovers Walsall 5-2 3S
1937-12-27 Walsall Bristol Rovers 5-2 3S

 

date home visitor FT divsion
1898-01-08 Arsenal Manchester Utd 5-1 2
1898-02-26 Manchester Utd Arsenal 5-1 2
1898-12-03 Arsenal Manchester Utd 5-1 2

 

date home visitor FT divsion
1957-09-28 Man City Spurs 5-1 1
1958-02-08 Spurs Man City 5-1 1
1958-11-01 Man City Spurs 5-1 1

 

There you go !

… and in case anyone is still reading and still cares…. there have been two runs of scorelines of a combined five goals or more that have occurred four times in a row between two teams. Both involve Manchester City. In the 1955/56 and 1956/57 seasons in division 1, every game between Manchester City and Luton Town ended in a 3-2 home win. The last game between Leicester and Manchester City of the 1925/26 division 1 season ended in a 3-2 away win for Man City. They didn’t play each other again until the 1928/29 season when both games ended with Leicester winning 3-2. The following season, the first game they played ended with Man City winning 3-2 in Manchester!