Every Saturday in the UK, that days EPL games are shown in a highlights TV package called Match of the Day. Obviously more people tune in when there have been a lot of goals during that day’s games. On some weeks though very few goals are scored. I was interested in finding out which date in history, the lowest amount of goals scored and goals-per-game occurred in the top flight of English football. I can do this extremely quickly using my engsoccerdata R package. See here for more details. Here I shall walk you through how to do it.
First, install engsoccerdata if you have not already: Make sure you have the devtools package loaded first:
library(devtools)
install_github('jalapic/engsoccerdata', username = "jalapic")
Load required packages.
library(engsoccerdata)
library(dplyr)
library(lubridate)
library(ggplot2)
Obviously, some dates only had one game played and that game could end 0-0. Therefore, I will impose a minimum of six games played on any particular date. Further, as the engsoccerdata2 dataset contains all English professional league soccer results from 1888-2014 (>180,000 complete games). For simplicity, we will only consider matches occurring in the top division.
Here I use dplyr to quickly create a summary dataframe of the number of games played, total goals scored and goals-per-game on every date in English soccer history. The variable totgoal in engsoccerdata2 represents the total goals scored in each game.
library(engsoccerdata)
df <- engsoccerdata2
# gp = games played for each unique date
# total = total goals scored on each unique date
# gpg = goals-per-game on each unique date
df.summary <- df %>%
filter(tier==1) %>%
group_by(Date) %>%
summarise(gp =n(), total=sum(totgoal), gpg=total/gp) %>%
filter(gp >= 6) %>%
arrange(gpg)
df.summary
## Source: local data frame [4,046 x 4]
##
## Date gp total gpg
## 1 2001-11-24 6 3 0.500000
## 2 1971-04-12 8 6 0.750000
## 3 1923-04-28 10 10 1.000000
## 4 1904-02-13 8 9 1.125000
## 5 1993-03-10 6 7 1.166667
## 6 1922-03-18 11 13 1.181818
## 7 1925-05-02 9 11 1.222222
## 8 1998-08-29 8 10 1.250000
## 9 1923-09-15 11 14 1.272727
## 10 1947-09-13 11 14 1.272727
## .. ... .. ... ...
This summary dataframe is arranged in ascending order of goals-per-game on each date. We can see that the lowest goals per game was on 24th November 2001. We can use the lubridate package to find out what day of the week that was, and use dplyr to return which games they were:
df$Date <- as.Date(df$Date, format="%Y-%m-%d")
as.character(wday("2001-11-24", label=T))
## [1] "Sat"
df %>%
filter(tier==1 & Date=="2001-11-24") %>%
select(Date,home, visitor,FT)
## Date home visitor FT
## 1 2001-11-24 Bolton Wanderers Fulham 0-0
## 2 2001-11-24 Chelsea Blackburn Rovers 0-0
## 3 2001-11-24 Leicester City Everton 0-0
## 4 2001-11-24 Newcastle United Derby County 1-0
## 5 2001-11-24 Southampton Charlton Athletic 1-0
## 6 2001-11-24 West Ham United Tottenham Hotspur 0-1
Next I decided to take a look at the dates on which the fewest goals were scored if there were 6 games played, 7 games played, 8 games played etc. on a given date.
First, calculate the distribution of number of games played on unique dates:
table(df.summary$gp)
##
## 6 7 8 9 10 11
## 353 456 526 496 611 1604
As can be seen, the maximum number of games ever played in the top tier on a unique date is 11.
To get the fewest for each number of games played, we can simply use ‘filter’ to return the minimum number of total goals scored in conjunction with grouping games played with ‘group_by’. In addition, I add the day of the week into a new variable using ‘mutate’.
df.summary %>%
group_by(gp) %>%
filter(total == min(total)) %>%
arrange(gp) %>%
mutate(day = as.character(wday(Date, label=T)))
## Source: local data frame [10 x 5]
## Groups: gp
##
## Date gp total gpg day
## 1 2001-11-24 6 3 0.500000 Sat
## 2 1900-03-24 7 9 1.285714 Sat
## 3 1912-03-09 7 9 1.285714 Sat
## 4 1977-04-11 7 9 1.285714 Mon
## 5 1982-04-03 7 9 1.285714 Sat
## 6 2002-09-21 7 9 1.285714 Sat
## 7 1971-04-12 8 6 0.750000 Mon
## 8 1925-05-02 9 11 1.222222 Sat
## 9 1923-04-28 10 10 1.000000 Sat
## 10 1922-03-18 11 13 1.181818 Sat
Just out of interest, I also decided to make a plot examining games played on unique dates by total number of goals scored on that date, including data from all divisions.
df.all <-
df %>%
group_by(Date) %>%
summarise(gp =n(), total=sum(totgoal), gpg=total/gp)
ggplot(df.all, aes(gp, total)) +
geom_point(color="dodgerblue", size=2) +
xlab("Games played on unique date") +
ylab("Total goals scored") +
ggtitle("Total scoring on unique dates in English soccer history") +
theme(
panel.grid.major.x = element_line(color="gray85"),
panel.grid.major.y = element_line(color="gray85"),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.minor = element_blank(),
plot.background = element_rect(color = "ghostwhite"),
panel.background = element_blank(),
plot.title = element_text(hjust=0,vjust=1)
)
Looking at this chart, it appears that only twice in history have more than 200 goals been scored on a given date. It’s super easy to find out when these were:
df.all %>% filter(total >= 200)
## Source: local data frame [2 x 4]
##
## Date gp total gpg
## 1 1932-01-02 43 209 4.860465
## 2 1936-02-01 44 209 4.750000
Both of these were in the 1930s and both had 209 goals scored! Here are the results for 2nd January 1932…
df %>%
filter(Date == "1932-01-02") %>%
select(Date, home, visitor, FT, division) %>%
arrange(division, home)
## Date home visitor FT division
## 1 1932-01-02 Birmingham City Everton 4-0 1
## 2 1932-01-02 Chelsea Middlesbrough 4-0 1
## 3 1932-01-02 Derby County Blackpool 5-0 1
## 4 1932-01-02 Grimsby Town Huddersfield Town 1-4 1
## 5 1932-01-02 Leicester City Aston Villa 3-8 1
## 6 1932-01-02 Liverpool Newcastle United 4-2 1
## 7 1932-01-02 Portsmouth Sheffield United 2-1 1
## 8 1932-01-02 Sheffield Wednesday Blackburn Rovers 5-1 1
## 9 1932-01-02 Sunderland Manchester City 2-5 1
## 10 1932-01-02 West Bromwich Albion Arsenal 1-0 1
## 11 1932-01-02 West Ham United Bolton Wanderers 3-1 1
## 12 1932-01-02 Bradford City Barnsley 9-1 2
## 13 1932-01-02 Burnley Southampton 1-3 2
## 14 1932-01-02 Bury Bristol City 2-1 2
## 15 1932-01-02 Chesterfield Stoke City 1-3 2
## 16 1932-01-02 Leeds United Swansea City 3-2 2
## 17 1932-01-02 Manchester United Bradford Park Avenue 0-2 2
## 18 1932-01-02 Millwall Notts County 4-3 2
## 19 1932-01-02 Nottingham Forest Charlton Athletic 3-2 2
## 20 1932-01-02 Port Vale Plymouth Argyle 2-0 2
## 21 1932-01-02 Preston North End Oldham Athletic 2-3 2
## 22 1932-01-02 Tottenham Hotspur Wolverhampton Wanderers 3-3 2
## 23 1932-01-02 Accrington Stanley Rochdale 3-0 3a
## 24 1932-01-02 Barrow Walsall 7-1 3a
## 25 1932-01-02 Carlisle United Hartlepool United 3-2 3a
## 26 1932-01-02 Darlington Lincoln City 0-6 3a
## 27 1932-01-02 Gateshead New Brighton 4-0 3a
## 28 1932-01-02 Halifax Town Hull City 2-2 3a
## 29 1932-01-02 Rotherham United Southport 2-0 3a
## 30 1932-01-02 Stockport County Doncaster Rovers 1-0 3a
## 31 1932-01-02 Tranmere Rovers York City 2-2 3a
## 32 1932-01-02 Wrexham Crewe Alexandra 2-4 3a
## 33 1932-01-02 Bristol Rovers AFC Bournemouth 4-1 3b
## 34 1932-01-02 Cardiff City Northampton Town 5-0 3b
## 35 1932-01-02 Coventry City Fulham 5-5 3b
## 36 1932-01-02 Exeter City Thames 4-1 3b
## 37 1932-01-02 Gillingham Southend United 4-0 3b
## 38 1932-01-02 Leyton Orient Watford 2-2 3b
## 39 1932-01-02 Luton Town Reading 6-1 3b
## 40 1932-01-02 Norwich City Brighton & Hove Albion 2-1 3b
## 41 1932-01-02 Queens Park Rangers Brentford 1-2 3b
## 42 1932-01-02 Swindon Town Mansfield Town 5-2 3b
## 43 1932-01-02 Torquay United Crystal Palace 3-1 3b
#division 3a = division 3 North
#division 3b = division 3 South
Wow ! what day of football. Stockport - Doncaster was the one to avoid on this day.
The other notable data point from the above chart is the outlying point at about 45 games played and only approximately 70 goals scored. To find this point, I did the following:
df.all %>%
filter(gp > 40 & total <80 ) %>%
arrange(gpg)
## Source: local data frame [6 x 4]
##
## Date gp total gpg
## 1 1925-04-04 44 71 1.613636
## 2 1986-08-30 41 73 1.780488
## 3 1980-03-01 41 76 1.853659
## 4 2008-03-01 41 76 1.853659
## 5 1922-11-25 42 78 1.857143
## 6 1985-04-06 42 78 1.857143
This outlier is 4th April 1925 when only 71 goals occurred in 44 games. The following season the offside law was changed to encourage more goal scoring.
This was just a quick look at some extreme scoring patterns across different dates. It gives you a flavor of the type of questions that can be explored simply with this datatset.
Any questions or comments, please email me at jc3181 AT columbia DOT edu