The 2016-17 FA Cup semi-finalists are Chelsea (1st in EPL), Tottenham Hotspur (2nd), Manchester City (3rd) and Arsenal (5th). Is this the best ever line-up of semi-finalists? I used my engsoccerdata R package available on CRAN to find out. This dataset contains hundreds of thousands of professional soccer results, including all from England since 1877.

Here’s how I checked to find the best semi-finalists ever.

 

Load the packages

library(tidyverse)
library(engsoccerdata)

 

# get date of first semi for each season
facup %>% 
  filter(Season>=1888) %>%
  filter(Season!=1945) %>%
  filter(round=="s") %>%
  group_by(Season) %>%
  arrange(Date) %>%
  filter(row_number()==1) %>%
  select(Season,Date) -> semi_dates

head(semi_dates)
## Source: local data frame [6 x 2]
## Groups: Season [6]
## 
##   Season       Date
##    <int>     <date>
## 1   1888 1889-03-16
## 2   1889 1890-03-08
## 3   1890 1891-02-28
## 4   1891 1892-02-27
## 5   1892 1893-03-04
## 6   1893 1894-03-10

 

# get teams in semis for each season
facup %>% 
  filter(Season>=1888) %>%
  filter(Season!=1945) %>%
  filter(round=="s") %>%
  split(., .$Season) %>%
  map(~ select(., home,visitor)) %>%
  map(unlist) %>%
  map(unique) -> semi_teams

head(semi_teams)
## $`1888`
## [1] "Preston North End"       "Wolverhampton Wanderers"
## [3] "Blackburn Rovers"        "West Bromwich Albion"   
## 
## $`1889`
## [1] "Blackburn Rovers"        "Sheffield Wednesday"    
## [3] "Wolverhampton Wanderers" "Bolton Wanderers"       
## 
## $`1890`
## [1] "Blackburn Rovers"     "Notts County"         "Sunderland"          
## [4] "West Bromwich Albion"
## 
## $`1891`
## [1] "Aston Villa"          "West Bromwich Albion" "Nottingham Forest"   
## [4] "Sunderland"          
## 
## $`1892`
## [1] "Everton"                 "Wolverhampton Wanderers"
## [3] "Preston North End"       "Blackburn Rovers"       
## 
## $`1893`
## [1] "Bolton Wanderers"    "Notts County"        "Sheffield Wednesday"
## [4] "Blackburn Rovers"

 

Now I will use the maketable_eng function to get the league table on the date of the first semi-final. I have to get each Season’s dataframe into a list only including top-tier games. Then, I make the table and filter the position of each team.

 

# get top tier table on date of first semi.
england %>% 
  filter(tier==1) %>%
  split(., .$Season)  -> england_seasons

table_res=NULL
for(i in 1:nrow(semi_dates)){
table_res[[i]] <-  maketable_eng(england_seasons[[i]] %>% filter(Date<=semi_dates$Date[[i]]), 
                                 tier=1, 
                                 Season=semi_dates$Season[[i]]) %>%
                   filter(team %in% semi_teams[[i]]) 
}


head(table_res,3)
## [[1]]
##                      team GP  W D  L gf ga        gd Pts Pos
## 1       Preston North End 22 18 4  0 74 15 4.9333333  40   1
## 2 Wolverhampton Wanderers 22 12 4  6 50 37 1.3513514  28   3
## 3        Blackburn Rovers 20  9 6  5 62 42 1.4761905  24   4
## 4    West Bromwich Albion 22 10 2 10 40 46 0.8695652  22   5
## 
## [[2]]
##                      team GP  W D  L gf ga        gd Pts Pos
## 1        Blackburn Rovers 21 12 3  6 78 38 2.0526316  27   3
## 2 Wolverhampton Wanderers 21  9 5  7 46 37 1.2432432  23   4
## 3        Bolton Wanderers 20  9 0 11 51 58 0.8793103  18   8
## 
## [[3]]
##                   team GP  W D  L gf ga       gd Pts Pos
## 1         Notts County 21 10 4  7 45 34 1.323529  24   4
## 2     Blackburn Rovers 17 10 2  5 47 31 1.516129  22   5
## 3           Sunderland 20  8 5  7 43 30 1.433333  21   6
## 4 West Bromwich Albion 18  3 2 13 27 48 0.562500   8  12

 

As you can see, these are the semi-finalists from the top-tier in 1888-89, 1889-90, and 1890-91. In 1889-90 there was a non top-tier team in the semis (It was The Wednesday - now Sheffield Wednesday who were founder members of the Football Alliance, the rival to the Football League).

To make this easier, let’s put it into a dataframe - and arrange by the top five seasons.

# top 5 Seasons for highest placed teams

table_res_pos <- lapply(table_res, function(x) x$Pos)

data.frame(
Season = semi_dates$Season,
teams = unlist(lapply(table_res_pos, length)),
total = unlist(lapply(table_res_pos, function(x) sum(as.numeric(x))))
) %>%
  filter(teams==4) %>%
  arrange(total) %>%
  head(5)
##   Season teams total
## 1   1888     4    13
## 2   2008     4    14
## 3   1964     4    15
## 4   1896     4    16
## 5   1904     4    17

 

This shows that in 1888-90 there were four teams from the top tier of English football in the semi-finals of the FA Cup and their total summed league position was 13. We saw this above. It was Preston North End (1st), Wolves (3rd), Blackburn Rovers (4th) and WBA (5th).

We can look at the other teams like this:

semi_teams[which(semi_dates$Season==1888)]
## $`1888`
## [1] "Preston North End"       "Wolverhampton Wanderers"
## [3] "Blackburn Rovers"        "West Bromwich Albion"

 

semi_teams[which(semi_dates$Season==2008)]
## $`2008`
## [1] "Chelsea"           "Everton"           "Arsenal"          
## [4] "Manchester United"

 

semi_teams[which(semi_dates$Season==1964)]
## $`1964`
## [1] "Leeds United"      "Liverpool"         "Manchester United"
## [4] "Chelsea"

 

So, if the teams in this season’s FA Cup semi-final remain in the same positions (1,2,3,5), that will be by my reckoning the highest ever positions of semi-finalists.

Contact me - [@jalapic](https://twitter.com/jalapic) on twitter or jc3181 AT columbia DOT edu