1 Indroduction

The FIFA Women’s World Cup represents the highest level of international competition in women’s football and provides a valuable context for analysing team performance. As the game continues to grow in popularity and competitiveness, data-driven analysis has become an essential tool for understanding the factors associated with success.

This report presents an exploratory data analysis of team-level performance across multiple Women’s World Cup tournaments. The dataset contains information on team characteristics, playing style and performance indicators such as goals scored, assists, possession, discipline and matches played.

The primary objective of this analysis is to identify patterns and relationships in key performance metrics. In particular, the report focuses on attacking efficiency (e.g., goals per 90 minutes), possession as an indicator of playing style and tournament progression measured through matches played.

Using the tidyverse framework in R, a range of descriptive statistics and visualisations are produced to explore the distribution of performance indicators, compare teams across tournaments and highlight factors that may be associated with deeper progression in the competition.

This exploratory approach aims to provide insight into how different performance dimensions contribute to success at the highest level of women’s international football.

1.1 KPI Summary Statistics

Summary Statistics for Key Performance Indicators
KPI Mean SD Min Max
Goals per 90 1.38 0.92 0 4.69
Possession (%) 49.25 6.58 30 63.00

The summary statistics provide an overview of overall team performance across Women’s World Cup tournaments. On average, teams scored 1.38 goals per 90 minutes, with considerable variation observed across teams (SD = 0.92), indicating differences in attacking efficiency. Goal-scoring performance ranged from 0 to 4.69 goals per 90, highlighting the presence of both low-scoring teams and highly productive attacking sides.

Average possession was 49.25%, with a relatively wide spread (SD = 6.58) and values ranging from 30% to 63%. This variation reflects differences in playing style, from teams that adopt a more reactive approach to those that emphasise ball control and territorial dominance.

Matches played ranged from three to seven, corresponding to elimination at the group stage through to reaching the final. This range supports the use of matches played as a proxy measure of tournament progression in the subsequent analysis of performance factors associated with success.

1.2 Tournament Progression

Figure 1 shows the number of teams by matches played across Women’s World Cup tournaments. The majority of teams played three matches, indicating elimination at the group stage. The number of teams decreases steadily as matches played increases, reflecting the progressive knockout structure of the competition. Only a small proportion of teams played six or seven matches, representing those that reached the semi-finals and final stages.

The distribution highlights that deep tournament progression is achieved by relatively few teams. As a result, matches played provides a meaningful proxy measure of team success and will be used as the primary outcome variable in the subsequent analysis of performance factors.

1.3 Distribution of Goals per 90 by Tournament Progression

Each point in Figure 2 represents a team’s performance in a specific Women’s World Cup tournament. The horizontal axis shows the number of matches played, which is used as a measure of tournament progression, while the vertical axis represents goals scored per 90 minutes as an indicator of attacking efficiency. The slight horizontal dispersion of points within each category is applied only to improve visibility and does not carry analytical meaning.

The distribution shows that teams progressing to later stages generally achieve higher goals per 90 values, with a greater concentration of high-scoring observations among teams that played six or seven matches. However, there is noticeable overlap between progression levels, indicating that while stronger attacking performance is associated with success, it is not the sole factor determining tournament advancement. Overall, the figure suggests that higher scoring efficiency increases the likelihood of deeper progression, but other performance dimensions also contribute to tournament outcomes.

1.4 Chance Creation and Team Success

Figure 3 illustrates the distribution of assists per 90 across different levels of tournament progression. Teams that played a greater number of matches generally display higher median assist values, indicating that more successful teams tend to create more goal-scoring opportunities.

Nevertheless, there is noticeable variation within each group and overlap between progression levels. This suggests that while chance creation is associated with success, it is not the sole determinant of tournament performance and should be considered alongside other performance indicators.

1.5 Discipline and Team Success

Figure 4 shows the distribution of total disciplinary actions (yellow and red cards combined) across different levels of tournament progression. Teams that played a higher number of matches generally accumulated more total cards, which is likely influenced by their greater exposure due to playing more games.

This suggests that total card counts reflect match volume as well as discipline. Therefore, disciplinary metrics should be interpreted cautiously when comparing teams across different levels of tournament progression.

1.6 Top Performing Teams

Top 10 Teams by Tournament Progression
Squad Year Matches Goals per 90 Possession (%)
USA 2019 7 3.6 56.0
Germany 2015 7 2.6 54.7
USA 2015 7 2.0 52.9
England 2019 7 1.9 60.4
Sweden 2019 7 1.6 48.4
Netherlands 2019 7 1.5 57.1
England 2015 7 1.4 48.1
Japan 2015 7 1.3 55.1
USA 1991 6 4.7 Not available
Germany 2003 6 4.0 51.8

This table presents the ten highest-performing team–tournament observations, ranked first by matches played and then by goals scored per 90 minutes. Ordering the table in this way prioritises tournament progression as the primary indicator of success while also highlighting attacking effectiveness among teams that reached the later stages.

The majority of the top observations correspond to teams that played seven matches, indicating progression to the final stages of the tournament. Within this group, higher goals per 90 values are consistently observed, reinforcing earlier findings that scoring efficiency is strongly associated with deeper tournament advancement.

Several teams, particularly the United States, Germany and England, appear multiple times across different tournament years, reflecting sustained high performance over time. In addition, many of the highest-ranked teams also demonstrate relatively strong possession levels, suggesting that successful teams often combine effective attacking output with an ability to control matches.

Overall, the table highlights a clear performance profile associated with tournament success: deep progression, high scoring efficiency and, in many cases, strong game control. These descriptive findings are consistent with the correlation, hypothesis testing and regression results presented earlier in the analysis.

1.7 Relationship Between Performance Metrics

Figure 5 presents the correlation matrix illustrating the relationships between key team performance indicators. Tournament progression, measured by matches played, shows the strongest positive relationship with Goals per 90 (r = 0.62), indicating that teams with higher scoring efficiency were more likely to progress further in the competition. A moderate positive association is also observed between Matches played and Assists per 90 (r = 0.47), suggesting that the ability to create goal-scoring opportunities contributes to tournament success.

In contrast, Possession (%) shows a weaker relationship with Matches played (r = 0.36), indicating that controlling the ball is less strongly associated with progression compared with attacking output.

Among the performance variables, Goals per 90 and Assists per 90 display a strong positive correlation (r = 0.60), reflecting the close link between chance creation and goal scoring. Possession (%) demonstrates weaker relationships with both Goals per 90 (r = 0.40) and Assists per 90 (r = 0.29), suggesting that while teams with higher possession may create and score more, the relationship is relatively limited.

Overall, the results indicate that attacking effectiveness—particularly goal scoring—shows the strongest association with tournament success, whereas possession appears to play a secondary role.

1.8 Performance Factors Associated with Tournament Success: Hypothesis Testing

Performance Differences Between High and Low Progression Teams
Metric High (6–7 matches) Low (3–5 matches) p-value
Goals per 90 2.12 1.03 < 0.001
Assists per 90 1.03 0.54 0.005
Possession (%) 52.14 48.49 0.011

This section evaluates whether key performance metrics differ between more successful and less successful teams using formal hypothesis testing. Tournament success is operationalised using matches played, where teams that played 6–7 matches are classified as High progression (semi-finalists/finalists), and teams that played 3–5 matches are classified as Low progression (earlier elimination). This grouping provides a practical way to compare teams based on how far they advanced in the tournament.

For each performance metric (goals per 90, assists per 90 and possession), an independent two-sample t-test was conducted to compare the mean value between the High and Low progression groups. An independent t-test is appropriate here because the two groups consist of different teams (i.e., observations are independent) and the objective is to assess whether the average performance level differs between success categories. The Welch two-sample t-test was used because it does not require the assumption of equal variances between groups and is therefore a more robust choice for real-world sports performance data, where variability often differs between high-performing and lower-performing teams.

The results indicate that High progression teams recorded a substantially higher mean goals per 90 (2.12) compared with Low progression teams (1.03), with the difference being statistically significant (p < 0.001). This provides strong evidence that scoring efficiency is associated with deeper tournament progression. High progression teams also produced a higher mean assists per 90 (1.03 vs 0.54; p = 0.005), suggesting that chance creation is similarly linked to success. Possession was also higher on average for High progression teams (52.14% vs 48.49%; p = 0.011), although the difference is smaller in magnitude, implying that ball control may contribute to success but is less influential than attacking output. Overall, the hypothesis tests support the exploratory findings that attacking effectiveness (goals and assists) is the strongest performance characteristic associated with tournament success, while possession appears to play a secondary role.

1.9 Performance Profiles of High and Low Progression Teams

This map presents the variable contribution plot from the principal component analysis, illustrating how each performance metric relates to the two main dimensions identified in the data. The direction and length of each arrow indicate the strength and nature of the relationship between the variable and the principal components. Variables that point in a similar direction are positively associated, while those pointing in opposite directions represent contrasting performance characteristics.

The first principal component (Dim1), which explains 61.2% of the total variance, is primarily driven by attacking performance. Goals per 90 and assists per 90 show strong positive contributions and are closely aligned, indicating that teams that create more chances also tend to score more. Possession also contributes positively to this dimension, although to a lesser extent. In contrast, total disciplinary actions (yellow and red cards combined) load in the opposite direction, suggesting that higher card counts are associated with weaker overall performance profiles.

The second principal component (Dim2), explaining 22.6% of the variance, appears to capture a secondary dimension related mainly to discipline, as total cards show the strongest vertical contribution compared with the other variables. This indicates that disciplinary behaviour represents a distinct performance characteristic that varies independently from attacking effectiveness.

Overall, the variable structure suggests that the dominant performance dimension in the dataset reflects attacking efficiency and offensive productivity, while discipline represents a separate and less influential factor. Together with the PCA map of teams, these results indicate that successful teams are primarily distinguished by stronger attacking output rather than differences in possession or disciplinary behaviour.

1.10 Modelling Tournament Success

Regression Results: Factors Associated with Tournament Progression
Predictor Coefficient (β) Std. Error t-value p-value
Goals per 90 1.43 0.40 3.55 < 0.001
Assists per 90 -0.60 0.51 -1.18 0.242
Possession (%) 0.01 0.02 0.52 0.607
Total Cards -0.09 0.06 -1.49 0.141

The regression analysis examines the simultaneous effect of multiple performance indicators on tournament progression, measured by matches played. The results show that goals per 90 is the only statistically significant predictor of success (β = 1.44, p < 0.001). This indicates that, holding other factors constant, higher scoring efficiency is strongly associated with advancing further in the competition.

In contrast, assists per 90 and possession do not show statistically significant effects, suggesting that their independent contribution to tournament progression is limited once goal scoring is taken into account. The coefficient for assists is negative but not significant, which likely reflects its close relationship with goals rather than a true negative effect. Similarly, total cards does not significantly influence progression, indicating that disciplinary factors play a relatively minor role compared with attacking performance.

Overall, the regression results reinforce the earlier exploratory findings by demonstrating that goal-scoring efficiency is the primary performance factor associated with tournament success, while other indicators such as possession, chance creation and discipline appear to have a weaker or indirect influence when considered jointly.

1.11 Conclusion

This study conducted an exploratory data analysis of team performance across multiple FIFA Women’s World Cup tournaments with the aim of identifying the factors associated with tournament success. Tournament progression, measured by matches played, was used as a practical indicator of team performance, allowing comparisons between teams eliminated in earlier rounds and those reaching the final stages.

The descriptive analysis showed that only a small proportion of teams progress beyond the group stage, confirming that deep tournament advancement is achieved by relatively few high-performing teams. Across the performance indicators examined, attacking metrics consistently showed the strongest relationship with progression. Teams that played more matches generally recorded higher values for both goals per 90 and assists per 90, suggesting that offensive effectiveness and chance creation are key characteristics of successful teams.

Correlation analysis further supported these findings, with matches played showing the strongest positive association with goals per 90 and a moderate relationship with assists per 90. Possession displayed only a weaker relationship with progression, indicating that controlling the ball alone is not sufficient to ensure success. The principal component analysis reinforced this pattern, identifying a dominant performance dimension driven primarily by attacking output, while disciplinary actions formed a secondary and less influential dimension.

Formal hypothesis testing confirmed that teams reaching the later stages of the tournament significantly outperformed early-eliminated teams in terms of goals per 90, assists per 90 and, to a lesser extent, possession. However, when all performance variables were considered simultaneously in the regression model, goals per 90 emerged as the only statistically significant independent predictor of tournament progression. This suggests that while several performance indicators are associated with success at a descriptive level, scoring efficiency is the key factor that most clearly distinguishes the most successful teams.

Overall, the results indicate that tournament success in the Women’s World Cup is primarily driven by attacking effectiveness, particularly the ability to convert opportunities into goals. Possession and discipline appear to play supporting roles but are less influential once attacking performance is taken into account.

It is important to acknowledge that the analysis is based on team-level summary statistics and does not account for contextual factors such as opponent strength, match location or tactical differences. In addition, tournament formats and competitive balance have evolved over time, which may influence performance comparisons across different years. Future research could extend this analysis by incorporating more detailed match-level data or advanced performance metrics.

Despite these limitations, the findings provide a clear performance profile associated with success at the highest level of women’s international football. From a practical perspective, the results highlight the importance of offensive efficiency and clinical finishing as the most critical factors for teams aiming to achieve deep progression in major international tournaments.