James P. Curley - 6 Dec 2014

jc3181 AT columbia DOT edu

    Every Saturday in the UK, that days EPL games are shown in a highlights TV package called Match of the Day. Obviously more people tune in when there have been a lot of goals during that day’s games. On some weeks though very few goals are scored. I was interested in finding out which date in history, the lowest amount of goals scored and goals-per-game occurred in the top flight of English football. I can do this extremely quickly using my engsoccerdata R package. See here for more details. Here I shall walk you through how to do it.

 


Install required packages

First, install engsoccerdata if you have not already: Make sure you have the devtools package loaded first:

library(devtools)
install_github('jalapic/engsoccerdata', username = "jalapic")

 

Processing data

Load required packages.

library(engsoccerdata)
library(dplyr)
library(lubridate)
library(ggplot2)

 

Obviously, some dates only had one game played and that game could end 0-0. Therefore, I will impose a minimum of six games played on any particular date. Further, as the engsoccerdata2 dataset contains all English professional league soccer results from 1888-2014 (>180,000 complete games). For simplicity, we will only consider matches occurring in the top division.

Here I use dplyr to quickly create a summary dataframe of the number of games played, total goals scored and goals-per-game on every date in English soccer history. The variable totgoal in engsoccerdata2 represents the total goals scored in each game.

library(engsoccerdata)
df <- engsoccerdata2

# gp = games played for each unique date
# total = total goals scored on each unique date
# gpg = goals-per-game on each unique date

df.summary <- df %>%
  filter(tier==1) %>%
  group_by(Date) %>%
  summarise(gp =n(), total=sum(totgoal), gpg=total/gp) %>% 
  filter(gp >= 6) %>%
  arrange(gpg)

df.summary
## Source: local data frame [4,046 x 4]
## 
##          Date gp total      gpg
## 1  2001-11-24  6     3 0.500000
## 2  1971-04-12  8     6 0.750000
## 3  1923-04-28 10    10 1.000000
## 4  1904-02-13  8     9 1.125000
## 5  1993-03-10  6     7 1.166667
## 6  1922-03-18 11    13 1.181818
## 7  1925-05-02  9    11 1.222222
## 8  1998-08-29  8    10 1.250000
## 9  1923-09-15 11    14 1.272727
## 10 1947-09-13 11    14 1.272727
## ..        ... ..   ...      ...

 

This summary dataframe is arranged in ascending order of goals-per-game on each date. We can see that the lowest goals per game was on 24th November 2001. We can use the lubridate package to find out what day of the week that was, and use dplyr to return which games they were:

df$Date <- as.Date(df$Date, format="%Y-%m-%d") 
as.character(wday("2001-11-24", label=T))
## [1] "Sat"
df %>% 
  filter(tier==1 & Date=="2001-11-24") %>%
  select(Date,home, visitor,FT)
##         Date             home           visitor  FT
## 1 2001-11-24 Bolton Wanderers            Fulham 0-0
## 2 2001-11-24          Chelsea  Blackburn Rovers 0-0
## 3 2001-11-24   Leicester City           Everton 0-0
## 4 2001-11-24 Newcastle United      Derby County 1-0
## 5 2001-11-24      Southampton Charlton Athletic 1-0
## 6 2001-11-24  West Ham United Tottenham Hotspur 0-1

Next I decided to take a look at the dates on which the fewest goals were scored if there were 6 games played, 7 games played, 8 games played etc. on a given date.

First, calculate the distribution of number of games played on unique dates:

table(df.summary$gp)
## 
##    6    7    8    9   10   11 
##  353  456  526  496  611 1604

As can be seen, the maximum number of games ever played in the top tier on a unique date is 11.  

To get the fewest for each number of games played, we can simply use ‘filter’ to return the minimum number of total goals scored in conjunction with grouping games played with ‘group_by’. In addition, I add the day of the week into a new variable using ‘mutate’.

df.summary %>%
  group_by(gp) %>%
  filter(total == min(total)) %>%
  arrange(gp) %>%
  mutate(day = as.character(wday(Date, label=T)))
## Source: local data frame [10 x 5]
## Groups: gp
## 
##          Date gp total      gpg day
## 1  2001-11-24  6     3 0.500000 Sat
## 2  1900-03-24  7     9 1.285714 Sat
## 3  1912-03-09  7     9 1.285714 Sat
## 4  1977-04-11  7     9 1.285714 Mon
## 5  1982-04-03  7     9 1.285714 Sat
## 6  2002-09-21  7     9 1.285714 Sat
## 7  1971-04-12  8     6 0.750000 Mon
## 8  1925-05-02  9    11 1.222222 Sat
## 9  1923-04-28 10    10 1.000000 Sat
## 10 1922-03-18 11    13 1.181818 Sat

 

One last thing:

Just out of interest, I also decided to make a plot examining games played on unique dates by total number of goals scored on that date, including data from all divisions.

df.all <- 
  df %>%
  group_by(Date) %>%
  summarise(gp =n(), total=sum(totgoal), gpg=total/gp)

ggplot(df.all, aes(gp, total)) +
  geom_point(color="dodgerblue", size=2) +
  xlab("Games played on unique date") +
  ylab("Total goals scored") +
  ggtitle("Total scoring on unique dates in English soccer history") +
   theme(
    panel.grid.major.x = element_line(color="gray85"),
    panel.grid.major.y = element_line(color="gray85"),
    axis.ticks.x = element_blank(),
    axis.ticks.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.background  = element_rect(color = "ghostwhite"),
    panel.background = element_blank(),
    plot.title = element_text(hjust=0,vjust=1)
    )

 

Looking at this chart, it appears that only twice in history have more than 200 goals been scored on a given date. It’s super easy to find out when these were:

df.all %>% filter(total >= 200)
## Source: local data frame [2 x 4]
## 
##         Date gp total      gpg
## 1 1932-01-02 43   209 4.860465
## 2 1936-02-01 44   209 4.750000

Both of these were in the 1930s and both had 209 goals scored! Here are the results for 2nd January 1932…

df %>% 
  filter(Date == "1932-01-02") %>%
  select(Date, home, visitor, FT, division) %>%
  arrange(division, home) 
##          Date                 home                 visitor  FT division
## 1  1932-01-02      Birmingham City                 Everton 4-0        1
## 2  1932-01-02              Chelsea           Middlesbrough 4-0        1
## 3  1932-01-02         Derby County               Blackpool 5-0        1
## 4  1932-01-02         Grimsby Town       Huddersfield Town 1-4        1
## 5  1932-01-02       Leicester City             Aston Villa 3-8        1
## 6  1932-01-02            Liverpool        Newcastle United 4-2        1
## 7  1932-01-02           Portsmouth        Sheffield United 2-1        1
## 8  1932-01-02  Sheffield Wednesday        Blackburn Rovers 5-1        1
## 9  1932-01-02           Sunderland         Manchester City 2-5        1
## 10 1932-01-02 West Bromwich Albion                 Arsenal 1-0        1
## 11 1932-01-02      West Ham United        Bolton Wanderers 3-1        1
## 12 1932-01-02        Bradford City                Barnsley 9-1        2
## 13 1932-01-02              Burnley             Southampton 1-3        2
## 14 1932-01-02                 Bury            Bristol City 2-1        2
## 15 1932-01-02         Chesterfield              Stoke City 1-3        2
## 16 1932-01-02         Leeds United            Swansea City 3-2        2
## 17 1932-01-02    Manchester United    Bradford Park Avenue 0-2        2
## 18 1932-01-02             Millwall            Notts County 4-3        2
## 19 1932-01-02    Nottingham Forest       Charlton Athletic 3-2        2
## 20 1932-01-02            Port Vale         Plymouth Argyle 2-0        2
## 21 1932-01-02    Preston North End         Oldham Athletic 2-3        2
## 22 1932-01-02    Tottenham Hotspur Wolverhampton Wanderers 3-3        2
## 23 1932-01-02   Accrington Stanley                Rochdale 3-0       3a
## 24 1932-01-02               Barrow                 Walsall 7-1       3a
## 25 1932-01-02      Carlisle United       Hartlepool United 3-2       3a
## 26 1932-01-02           Darlington            Lincoln City 0-6       3a
## 27 1932-01-02            Gateshead            New Brighton 4-0       3a
## 28 1932-01-02         Halifax Town               Hull City 2-2       3a
## 29 1932-01-02     Rotherham United               Southport 2-0       3a
## 30 1932-01-02     Stockport County        Doncaster Rovers 1-0       3a
## 31 1932-01-02      Tranmere Rovers               York City 2-2       3a
## 32 1932-01-02              Wrexham         Crewe Alexandra 2-4       3a
## 33 1932-01-02       Bristol Rovers         AFC Bournemouth 4-1       3b
## 34 1932-01-02         Cardiff City        Northampton Town 5-0       3b
## 35 1932-01-02        Coventry City                  Fulham 5-5       3b
## 36 1932-01-02          Exeter City                  Thames 4-1       3b
## 37 1932-01-02           Gillingham         Southend United 4-0       3b
## 38 1932-01-02        Leyton Orient                 Watford 2-2       3b
## 39 1932-01-02           Luton Town                 Reading 6-1       3b
## 40 1932-01-02         Norwich City  Brighton & Hove Albion 2-1       3b
## 41 1932-01-02  Queens Park Rangers               Brentford 1-2       3b
## 42 1932-01-02         Swindon Town          Mansfield Town 5-2       3b
## 43 1932-01-02       Torquay United          Crystal Palace 3-1       3b
#division 3a = division 3 North
#division 3b = division 3 South

Wow ! what day of football. Stockport - Doncaster was the one to avoid on this day.

The other notable data point from the above chart is the outlying point at about 45 games played and only approximately 70 goals scored. To find this point, I did the following:

df.all %>%
  filter(gp > 40 & total <80 ) %>%
  arrange(gpg)
## Source: local data frame [6 x 4]
## 
##         Date gp total      gpg
## 1 1925-04-04 44    71 1.613636
## 2 1986-08-30 41    73 1.780488
## 3 1980-03-01 41    76 1.853659
## 4 2008-03-01 41    76 1.853659
## 5 1922-11-25 42    78 1.857143
## 6 1985-04-06 42    78 1.857143

This outlier is 4th April 1925 when only 71 goals occurred in 44 games. The following season the offside law was changed to encourage more goal scoring.

   

This was just a quick look at some extreme scoring patterns across different dates. It gives you a flavor of the type of questions that can be explored simply with this datatset.

Any questions or comments, please email me at jc3181 AT columbia DOT edu