UCL Analysis

Author

Stephen Klayer

UEFA Champions League

The UEFA Champions League is one of the largest annual sporting competitions in the world. Guieness World Records crowned it as the “Greatest sports competition in history.” The league consists of many games from September to May to crown the greatest soccer team in all of Europe, usually defining the greatest team in the world also.

This competition is also often considered the most important for each team, as opposed to the league trophy and other trophies available for clubs.

Goal: Given the importance of the competition, what are the most important variables, aside from goals, that allows a team to win a game?

Intro into the Data

A dataset from Kaggle has the records from the 2020-21 season of the Champions League, https://www.kaggle.com/datasets/mcarujo/uefa-champions-league-20202021. The data has been mutated to include and change only 18 relevant columns. These variables include:

Variable Name Variable Explanation Variable Type
stage The phase or stage of the game String
date Date of the game in day, month, year String
team_name_home The home team of the respective game String
team_name_away The away team of the respective game String
team_home_score Goals scored by the home team Numeric
team_away_score Goals scored by the away team Numeric
possession_home % Possession of the ball by the home team Numeric
possession_away % Possession of the ball by the away team Numeric
total_shots_home Amount of shots by the home team Numeric
total_shots_away Amount of shots by the away team Numeric
shots_on_target_home Amount of shots on target by the home team Numeric
shots_on_target_away Amount of shots on target by the away team Numeric
duels_won_home % of 1 on 1 duels won by the home team Numeric
duels_won_away % of 1 on 1 duels won by the away team Numeric
prediction_team_home_win Pre-game predicted chance of home team winning Numeric
prediction_draw Pre-game predicted chance of a draw Numeric
prediction_team_away_win Pre-game predicted chance of away team winning Numeric
location The venue/stadium String

I also added some of my own variables:

Variable Name Variable Explanation Variable Type
home_result Different numbers given for win, draw or loss numeric
shooting_home_percent Percent of shots on target by total attempts by the home team numeric
shooting_away_percent Percent of shots on target by total attempts by the away team numeric

UEFA Champions League Analysis

Shooting Accuracy

Obviously more shots likely means more goals, so I considered the percent of shots on target. This allows me to see the team who actually has good attempts on target and not the team that just blindly shoots whenever they can.

Unsurprisingly, the better shooting accuracy the better the chance of winning. Even with goalkeepers, any shot on target could lead to a goal. Overall having better accuracy than opponents is more important than taking any chance you can get. It also shows the importance of defense. Overall when the away teams have a low shooting percentage, they are likely to lose, so good defense is arguably more important than better shooting ability.

Possession vs Duels

Possession often shows what team is the most dominate. I also wanted to see the duels won to see which team played better in 1 on 1 situations.

At first, these results seem all over the place. The lines represent 50% for each respective variable, showing the lines for teams of equal strength during the game. It seems teams with more possession don’t exactly win too much more than they lose. The more defining variable is duels won. There are more losses associated with less duels won than less possession, showing the importance of 1 on 1 situations in games.

Home Team Advantage

Considering I have order the previous graphs based on home result, I wanted to see if there was bias as the home team likely has the advantage.

The home team does not exactly seem to have a real advantage at all. There is almost an equal amount of home wins as losses, showing that there is no real home team advantage in the competition, meaning there is likely little to no bias in just focusing on the home team result each time.

Data Prediction

Considering the Champions League has a way to predict the winners of each game, I wanted to test the accuracy of this to see if the variables they use to predict outcomes are truly accurate.

The further away a dot is to the line, the higher chance of a draw. As the data moves up and to the right, the higher the chance of the home team winning, meaning the dots should be blue. This overall does seem to be true and fairly accurate. There may be a few outliers but the expected wins is mostly correct and as anticipated. The only surprise is the somewhat high number draws that are closer to the line. The odds of a draw always seem to low and should overall be increased to better predict the outcome. The number of draws that result when a team is heavily favored is too high. Also the number of draws when each team has a 50-50 chance is too low. Overall I think these predictions are poor and the estimates aren’t too accurate. Some teams are just way highly overestimated.

Attacking Statistics

Using the dataset, I created a new data frame that groups teams together to find each team’s overall stats.

I looked at offensive ability for each team. Mutating the data, I found where each team finished and their average goals/match to figure out if scoring a lot resulted in the highest placement or if teams who do better focus more on defense.

Aside from the teams that fail to make it out of the group stage, overall these bars are somewhat similar. It seems teams that make it to the semi-finals and finals don’t have the best overall attack. This means defense seems to be more important in these games as these teams don’t score as much as teams that finish in the quarter finals.

Secondary Source

I web scraped the UEFA Champions League website to find all time data, https://www.uefa.com/uefachampionsleague/history/rankings/. The different columns represent top 100 teams in each respective area, some teams have NA values as they are top 100 in a field but not top 100 in another. This can be helpful as some teams with very few games played will be filtered out as they may be outliers. These teams who have played in something like 1 champions league campaign should not be accounted for given the specific things this document will look for. The entries that are complete will represent the best of the best, which is what I would like to compare a specific 2020-21 campaign to the all time stats.

The data is located in https://myxavier-my.sharepoint.com/:x:/g/personal/klayers_xavier_edu/IQBO-XvBE_dFTZzvRsNg-DW3AWk3k5rmxhKXmx99tmOijno?rtime=mq2Vfj6t3kg.

Based on the last visual used, I wanted to double check this idea that defense may be more important than offense in this competition.

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

The slopes for the graphs differ heavily. Goals scored has a steeper slope that goals conceded. In accordance with win percentage, which is very important especially in the later rounds, this model reflects the fact that offense is more important overall than defense. Scoring more goals has a better effect on winning percentage than conceding less, showing the throughout the UCL history, offense is more important. This likely means 2020-21 was an outlier years in terms of the teams that advanced to the semi finals and finals.

Conclusion

Overall, there are many important factors that contribute to a team’s success in the Champions League. Based on 2020-21 data, teams who excel in winning duels are more likely to win than those who fall behind in 1 on 1 situations. UCL predictive models are also very good at predicting the winner before the match.

Variables that may seem important but have a more minimal affect include home field and possession. Home teams did not prove to have more wins than away teams. Overall, there seemed to be no real advantage with the team playing at their own stadium. Also, teams that control the ball more during the game did not exactly show a higher winning percentage than teams that had the ball less. This could be caused by teams focusing more on defense and preventing opposing attacking opportunities, leading to the opposing team holding on the ball for longer. The idea of defensive importance was also reflected this season through the fact that the semi finalists and finalists having less goals per match than teams eliminated in the quarter finals. Throughout the UCL history, this statistic seems to be less true but this data could be biased towards statistics that happened decades ago.