I will begin by briefly describing the dataset. This data comes from FBref, a site which aggregates data from soccer matches in select competitions around the world. At the time I collected this data, FBref was obtaining their data from StatsBomb, but near the end of the year 2022, they switched to a new provider, Opta. So, it is likely there are discrepancies in the data you see below compared to what you would see if you repeated this process now, as the data from Opta is different from the StatsBomb data. However, some of the metrics of data I obtained here are not available in the new Opta data, so I will stick with the StatsBomb data throughout this analysis.
The data I gathered from FBref is a collection of 249 team-based (as opposed to player-based) metrics from each of the 98 clubs who competed in the 2021/22 season in the “Big 5” European leagues: The Premier League (England), La Liga (Spain), Ligue 1 (France), Serie A (Italy), and The Bundesliga (Germany). ALL cumulative stats are adjusted to “per match” to account for the Bundesliga playing fewer matches in a season than the other leagues.
Simply stated, the objective is to score goals and to prevent the other team from scoring. The most obvious definition of a “good team” is a team which scores more goals and concedes fewer goals than other teams. I promise we will go more in depth, but let’s first just examine which teams score goals and which teams concede goals.
OK, so this is a bit of a mess, and doesn’t tell us much, other than that good teams which won their league title or otherwise challenged for it, like Manchester City, Liverpool, and Bayern Munich, were in fact good and that teams which got relegated, like Norwich and Greuther Furth, were in fact bad. The “good” and “bad” quadrants are both more heavily populated than the others, suggesting that teams which score a lot are likelier than not to be defensively proficient as well.
A caveat here is that only the teams which are within the same league play each other. So comparing Liverpool and Bayern Munich, for example, doesn’t quite make sense in this case, as there is no overlap in the teams they face. The next plot will be identical to the one above, except it will be color coded by league, and the labels will be removed to reduce clutter.
Looking at this, we can see that a disproportionate amount of the clubs in the lower left quadrant are Spanish, which indicates that La Liga generally has fewer goals than leagues with more clubs in the high scoring quadrant (upper right), which has more Italian and German clubs. Unsurprisingly though, there is plenty of overlap.
But soccer is a high variance sport, even over the course of one season. Plenty of teams get lucky or unlucky and score/concede more or less than they “should” given the chances they create and allow. Instead of plotting actual goals, we can plot “expected goals,” a measure of shot quality based on parameters immediately before the execution of the shot (shot location, defender positioning, and GK positioning, NOT the power or placement of the shot itself). I’ve removed penalty kick xG from this data, since those are strange events in that they result in a goal about 76% of the time (and are given an xG value of ~0.76 in accordance), but typically result from relatively low danger events.
This appears to be similar in many ways to the goals vs goals conceded plot, which is what we would expect. Most of the same teams are in the same quadrants, and being good at one end seems to be associated with being good at the other end. But some teams surely got lucky. Some surely inordinately so! But it is difficult to see that on comparison of the two plots because it is all too cluttered. But if we plot the difference between goals and expected goals scored and conceded, we can get a better feel for this.
I’ve denoted overperformance/underperformance as “luck” in the above figure, but where is this so-called “luck” taking place? Given the nature of the expected goals model, over/underperformance at both ends occurs due to shotstopping and finishing abilities. The extent to which those attributes can be pinned to luck vs skill is a point of contention among fans and pundits. So from here, I will assume that shotstopping is a skill whereas finishing is mostly luck (it definitely is a skill at some level, but among professional players, with a few notable exceptions, finishing ability doesn’t seem to vary much from player to player with any consistency or predictability). So essentially, any attacking over/underperformance, I am putting down to luck, while defensive over/underperformance due to goalkeeping, I will attribute to skill. Defensive over/underperformance due to opposition finishing will also be considered “luck.”
I made this plot out of curiousity to see if there is a noticeable difference in the distribution of offensive vs defensive overperformance. It looks like the range of defensive overperformance is a bit wider and less normal-looking, but it is not clear that there is a real difference between the two.
Here, I’ve plotted teams’ defensive “luck” (overperformance due to opposition finishing) against their defensive overperformance due to their own shotstopping performance (more “deserved” overperformance since this is considered to be a “real” skill).
So who were the luckiest teams overall? Who were the unluckiest? By combining defensive overperformance due to opposition finishing with attacking overperformance, we can get an answer, in terms of “goals gained due to luck per 90”
## Squad Luck
## 1 LVT -0.3757263
## 2 RSO -0.3699368
## 3 GLA -0.3682941
## 4 FUR -0.3672471
## 5 GEN -0.3659684
## 6 RVA -0.3483053
## 7 MAL -0.3252000
## 8 HBL -0.3067882
## 9 CAG -0.2824632
## 10 LIL -0.2620421
## Squad Luck
## 89 NAP 0.2960316
## 90 SEV 0.3231684
## 91 BRS 0.3263263
## 92 PSG 0.3287158
## 93 RBL 0.3566706
## 94 DOR 0.3815765
## 95 LAZ 0.3948000
## 96 RMA 0.4319263
## 97 NAN 0.4442737
## 98 REN 0.6395368
From this data, we can see that Levante was the unluckiest team in the Big 5 Leagues in 2021/22, having “lost” about 0.376 goals per match due to “luck” factors. They were followed by Real Sociedad, Borussia Monchengladbach, Greuther Furth, Genoa, Real Vallecano, Mallorca, Hertha Berlin, Cagliari, and Lille. 4 of these 10 teams were relegated, 3 of them by fewer than 5 points (Levante, Cagliari, and Genoa).
On the other hand, the luckiest team by some distance was Rennes, who gained a staggering 0.64 goals per match due to “luck” factors. Following them were Nantes, Real Madrid, Lazio, Borussia Dortmund, RB Leipzig, Paris Saint-Germain, Brest, Sevilla, and Napoli.
Notably, there 4 French teams in the 10 “luckiest” teams, and none in the 10 “unluckiest”. Meanwhile, there are 4 Spanish teams in the 10 “unluckiest,” and none in the “luckiest.” And there are no English teams in either category. Is this merely a coincidence? Does league have an effect on “luck”? I suppose we should first determine whether our “luck” metric is 0, on average. If not, then we are definitely incorporating some kind of skill into our luck.
##
## One Sample t-test
##
## data: Eurodata$Luck
## t = 1.1938, df = 97, p-value = 0.2355
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.01692696 0.06802757
## sample estimates:
## mean of x
## 0.0255503
A quick t-test shows us that we have no reason to conclude that “luck” isn’t zero on average. But does league effect luck? If so, then there is something wrong with our luck metric, as intuitively, pure “luck” shouldn’t vary across location. Some leagues might have more variance in luck than others, but luck should still average to zero across all leagues.
Just by comparing the boxplots, we can see that the differences seen at the extreme ends are likely a coincidence. There does not appear to be significant difference in “luck” from league to league.
All of the above represents the “substance” of each team to some degree. But what about style? How do different teams go about trying to win games? Does style vary across quality of team? Does style vary from league to league?
A basic element of “style” is whether or not teams attempt to “play out from the back” to build attacks, vs playing more directly in an attempt to avoid turning the ball over in bad areas. One way to measure this is simply by looking at how often teams play short passes from goal kicks vs launching the ball.
## # A tibble: 10 x 2
## Squad GKLaunchPct
## <chr> <dbl>
## 1 MCI 24.8
## 2 STU 31.3
## 3 HOF 27.1
## 4 BAY 14.2
## 5 PSG 14.6
## 6 LYO 31.2
## 7 NAP 21.2
## 8 LAZ 28.2
## 9 INT 28.9
## 10 BLG 32.2
## # A tibble: 10 x 2
## Squad GKLaunchPct
## <chr> <dbl>
## 1 BUR 93.1
## 2 EVE 82.4
## 3 NEW 86.6
## 4 WAT 86.2
## 5 OSA 87.5
## 6 GET 84
## 7 CAD 83.5
## 8 NAN 80.4
## 9 MET 84
## 10 TOR 89.2
The ten teams who launched the smallest percentage of goal kicks were Bayern Munich, PSG, Napoli, Manchester City, Hoffenheim, Lazio, Inter Milan, Lyon, Stuttgart, and Bologna. 3 of these teams were the Champions of their domestic leagues, while 2 others finished in the top 4 of their leagues.
The teams who launched the most goal kicks were Burnley, Torino, Osasuna, Newcastle, Watford, Getafe, Metz, Cadiz, Everton, and Nantes. 3 of these teams got relegated, and most of the others were near the bottom of their respective league tables.
But does this vary across leagues?
Compared to the “luck” comparison, the boxplot does appear to show variation between leagues. Teams in Germany and Italy were more likely to play short, while teams in Spain were more likely to launch their goal kicks.
Is there an association between being “good” and goal kick selection? The composition of teams in the “most likely to play short/long” lists above suggests that teams that are good typically play shorter from goal kicks. Let’s compare Launch Percentage and Goal Difference for all of the teams.
## [1] -0.5458909
Graphically, there does appear to be a weak, negative correlation between Launch Percentage and Goal Difference, meaning that in general, teams that win are more likely to play more of their goal kicks short. Mathematically, the observed correlation coefficient is -0.546, which, in English, has the same meaning as the preceding statement.
But how do teams use the ball when they have it? We will examine number of passes vs non-penalty expected goals.
We can actually see a fairly strong and positive correlation between passes attempted and xG generated. There are a few efficient teams like Mainz, Union Berlin, and Udinese who attempt few passes while still generating an above average amount of xG, and a few teams like Brighton, Clermont, Celta Vigo, and Sevilla who generated little xG despite passing a lot. The vast majority of teams, however, either pass more than average while generating more xG than average, or are the opposite.
But how do teams pass the ball? Are they actually progressing the ball up the field, or are they getting pinned in their own end?
This actually appears to have an even stronger correlation than the last plot. Teams who pass the ball a lot also have more forward passing distance. Arsenal was really the only especially ponderous team last season. What if we look at progressive distance over total distance instead?
Once again, we have a clear correlation. Teams who pass the ball more tend to have more total passing distance compared to progressive distance. Remember, teams who pass more also tend to have higher total progressive distance, despite their ratio of progressive to total distance being lower. This suggests that these teams have more dramatic increases in total passing distance (Which includes sideways and backward distance), which could mean that these teams are able to use these types of passes strategically in order to progress the ball further up the field.
What about defensive styles? Where and how often do teams press? Let’s start with “How often?” as well as “How successful?” We will simply look at pressures attempted vs success rate.
This appears somewhat surprising. Many teams who are known for pressing (Manchester City, Liverpool, Bayern Munich), no not appear to be frequent pressers. However, all three of these teams typically have the ball for a large portion of the game, so they don’t have as many opportunities to accumulate pressures. So we may need to adjust for time in possession.
This is a more intuitive and expected result. Teams like Bayern and Liverpool appear as both good and frequent pressers, and there is a correlation between frequency and success rate of pressures. A few teams, like PSG, Leicester, and Rennes are frequent but ineffective pressers, while Udinese and Union Berlin are infrequent but effective with their pressing. Are there stylistic differences between leagues here?
This appears to suggest that German teams press frequently and successfully compared to teams in other leagues.
These boxplots support the finding from above, where German teams appear to press more often and with more success than other teams. English and Spanish teams also seem relatively reluctant and ineffective with their pressing.