Introduction

In this project, I wanted to analyse on every match of the 2025 Women’s Euros and see how each team performed according to the stats. I recorded on an excel sheet the statistics of every game played at the tournament (using Google’s match statistics). Here is the code I used to analyse the trends in the data.

df <- readxl::read_excel("C:/Users/emman/OneDrive/Documents/Data analysis/Portfolio/Womens Euros.xlsx",
                         sheet = "Match Results")

# Convert decimals to percentages
df <- df %>%
  mutate(
    Possession = Possession * 100,
    `Pass %`   = `Pass %` * 100,
    `Tackle %` = `Tackle %` * 100
  )
#Put the data into a league table
team_summary <- df %>%
  group_by(Team) %>%
  summarise(
    Matches       = n(),
    Goals_For     = sum(`Goals Scored`, na.rm=TRUE),
    Goals_Against = sum(`Goals Conceded`, na.rm=TRUE),
    Wins          = sum(Win == "Yes", na.rm=TRUE),
    Draws         = sum(Draw == "Yes", na.rm=TRUE),
    Avg_Poss      = mean(Possession, na.rm=TRUE),
    Avg_Pass      = mean(`Pass %`, na.rm=TRUE),
    Avg_Tackle    = mean(`Tackle %`, na.rm=TRUE)
  ) %>%
#Calculate the standings
  mutate(
    Goal_Diff = Goals_For - Goals_Against,
    Points = Wins*3 + Draws*1,
    PPG = Points / Matches
  ) %>%
  arrange(desc(Points), desc(Goal_Diff))

team_summary

Here we have a league table for all the teams competing in the Euros. Obviously, teams that played more games will generally have more points, so in what follows I will use points per game to measure how well a team performed compared to others. Initially, we can see that Spain outperformed England with the same amount of games and that Spain clearly had the most possession, so this will be something to explore over the course of this project.

#Plot a bar graph of points per game
ggplot(team_summary, aes(x=reorder(Team, PPG), y=PPG, fill=Goal_Diff)) +
  geom_col() +
  coord_flip() +
  labs(title="Tournament Standings: Points per Game",
       x=NULL, y="PPG", fill="Goal Diff") +
  theme_minimal()

This bar graph compares each team’s points per game to give an accurate comparison of how teams performed in relation to the amount of matches they played. I chose this as the best method of comparison, because a team could be statistically one of the best performers in the tournament but get knocked out on penalties early on, which would would skew the data. The graph also uses a colour gradient to show goal difference, which enables us to see can see how that stacks up in relation to points per game. For example, we can see that Spain were the best performers in terms of results and, interestingly, England are only 5th despite winning the tournament. There is, unsurprisingly, an upward trend of points per game and goal difference, although England, despite being 5th, have the second highest goal difference, probably due to the amount of draws they gotin the tournament.

#Plot a scatter graph comparing goals for and against
avg_goals_for <- mean(team_summary$Goals_For)
avg_goals_against <- mean(team_summary$Goals_Against)
att_def_cor <- cor(team_summary$Goals_Against, team_summary$Goals_For, 
                   use = "complete.obs")

ggplot(team_summary, aes(x=Goals_Against, y=Goals_For, label=Team)) +
  geom_point(size=3, colour="steelblue") +
  geom_text(vjust=-0.7, size=3) +
  geom_vline(xintercept=avg_goals_against, linetype="dashed", colour="red") +
  geom_hline(yintercept=avg_goals_for, linetype="dashed", colour="red") +
  geom_smooth(method="lm", se=FALSE, colour="black", linetype="dotted") +
  labs(title="Attack vs Defence",
       subtitle = paste0("Correlation: ",
                      round(att_def_cor, 2)),
       x="Goals Conceded", y="Goals Scored") +
  theme_minimal()

This graph compares each team’s attack and defence simply in terms of goals for and against. There is a moderately weak negative correlation here implying that teams who scored a lot also didn’t concede many goals, signifying that there were significant quality gaps in the tournament. We can see that Spain and England were vastly above the line of best fit. It is interesting to note that despite playing the most games Spain conceded the third fewest goals which shows their dominance in the tournament. These results are obviously skewed as teams that played fewer games had less opportunity to score or concede, so an alternative (and possibly better) approach is to look at goals scored and conceded per game.

#Plot a scatter graph comparing goals for and against per game
goalsfor_per_game <- team_summary$Goals_For / team_summary$Matches
goalsagainst_per_game <- team_summary$Goals_Against / team_summary$Matches
avg_gfpg <- mean(goalsfor_per_game)
avg_gapg <- mean(goalsagainst_per_game)
att_def_cor <- cor(goalsagainst_per_game, goalsfor_per_game, 
                   use = "complete.obs")

ggplot(team_summary, aes(x=goalsagainst_per_game, y=goalsfor_per_game, 
                         label=Team)) +
  geom_point(size=3, colour="steelblue") +
  geom_text(vjust=-0.7, size=3) +
  geom_vline(xintercept=avg_gapg, linetype="dashed", colour="red") +
  geom_hline(yintercept=avg_gfpg, linetype="dashed", colour="red") +
  geom_smooth(method="lm", se=FALSE, colour="black", linetype="dotted") +
  labs(title="Attack vs Defence 2",
       subtitle = paste0("Correlation: ",
                      round(att_def_cor, 2)),
       x="Goals Conceded", y="Goals Scored") +
  theme_minimal()

This is a more accurate analysis as it removes the variable of number of games played. There is a stronger negative correlation here between goals scored and conceded, which reinforces the likelihood that teams who scored more goals also conceded fewer goals in this tournament.

#Plot a scatter graph comparing tackling with goals conceded
avg_Avg_Tackle <- mean(team_summary$Avg_Tackle)
avg_goals_against <- mean(team_summary$Goals_Against)
tackle_cor <- cor(team_summary$Avg_Tackle, team_summary$Goals_Against, 
                  use = "complete.obs")

ggplot(team_summary, aes(x=Avg_Tackle, y=Goals_Against, label=Team)) +
  geom_point(size=3, colour="darkgreen") +
  geom_text(vjust=-0.7, size=3) +
  geom_vline(xintercept=avg_Avg_Tackle, linetype="dashed", colour="red") +
  geom_hline(yintercept=avg_goals_against, linetype="dashed", colour="red") +
  geom_smooth(method="lm", se=FALSE, colour="black", linetype="dotted") +
  labs(title="Tackle Success Importance",
       subtitle = paste0("Correlation: ", round(tackle_cor, 2)),
       x="Average Tackle (%)", y="Goals Conceded") +
  theme_minimal()

This graph shows a weak positive correlation of higher tackle percentage with goals conceded. This interesting (and possibly surprising) finding might be due to strength of defence owning more to higher possession and less to tackling.

#Plot a scatter graph comparing tackling with goals conceded
avg_Avg_Poss <- mean(team_summary$Avg_Poss)
avg_goals_against <- mean(team_summary$Goals_Against)
poss_def_cor <- cor(team_summary$Avg_Poss, team_summary$Goals_Against,
                    use = "complete.obs")

ggplot(team_summary, aes(x=Avg_Poss, y=Goals_Against, label=Team)) +
  geom_point(size=3, colour="darkgreen") +
  geom_text(vjust=-0.7, size=3) +
  geom_vline(xintercept=avg_Avg_Poss, linetype="dashed", colour="red") +
  geom_hline(yintercept=avg_goals_against, linetype="dashed", colour="red") +
  geom_smooth(method="lm", se=FALSE, colour="black", linetype="dotted") +
  labs(title="Possession vs Goals Conceded",
       subtitle = paste0("Correlation: ", round(poss_def_cor, 2)),
       x="Average Possession (%)", y="Goals Conceded") +
  theme_minimal()

Here there is a much clearer trend with a moderately strong negative correlation showing that higher possession led to fewer goals conceded. It would be interesting to perform further analysis with a much larger data set to test which statistic is most important in a team’s defence.

#Plot a scatter graph comparing possession with ppg
avg_Avg_Poss <- mean(team_summary$Avg_Poss)
avg_PPG <- mean(team_summary$PPG)
poss_ppg_cor <- cor(team_summary$Avg_Poss, team_summary$PPG, 
                    use = "complete.obs")

ggplot(team_summary, aes(x=Avg_Poss, y=PPG, label=Team)) +
  geom_point(size=3, colour="firebrick") +
  geom_text(vjust=-0.7, size=3) +
  geom_vline(xintercept=avg_Avg_Poss, linetype="dashed", colour="red") +
  geom_hline(yintercept=avg_PPG, linetype="dashed", colour="red") +
  geom_smooth(method="lm", se=FALSE, colour="black", linetype="dotted") +
  labs(title="Possession vs Success",
       subtitle = paste0("Correlation: ", round(poss_ppg_cor, 2)),
       x="Average Possession (%)", y="Points per Game") +
  theme_minimal()

This graph shows a very clear trend with a strong correlation between higher possession and more success (as measured by points per game). This could indicate a link between possession and success or it could be a result of the stronger teams in the tournament have more of the ball.

Conclusion

In conclusion, this project shows that the best performing team in the 2025 Euros was Spain according to results per match. There seems to be a clear correlation between possession and positive results, although the data set is far too small to conclude whether possession is the most important factor in success. In future projects, I would like to further explore the key to success over a much larger database and see how this tournament compares.